Testbed for information extraction from Deep Web

Yasuhiro Yamada, Tetsuya Nakatoh, Nick Craswell, Sachio Hirokawa

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    3 Citations (Scopus)

    Abstract

    Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1], We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.

    Original languageEnglish
    Title of host publicationThirteenth International World Wide Web Conference Proceedings, WWW2004
    Pages1078-1079
    Number of pages2
    Publication statusPublished - 2004
    EventThirteenth International World Wide Web Conference Proceedings, WWW2004 - New York, NY, United States
    Duration: May 17 2004May 22 2004

    Publication series

    NameThirteenth International World Wide Web Conference Proceedings, WWW2004

    Other

    OtherThirteenth International World Wide Web Conference Proceedings, WWW2004
    Country/TerritoryUnited States
    CityNew York, NY
    Period5/17/045/22/04

    All Science Journal Classification (ASJC) codes

    • Engineering(all)

    Fingerprint

    Dive into the research topics of 'Testbed for information extraction from Deep Web'. Together they form a unique fingerprint.

    Cite this