TY - GEN
T1 - Testbed for information extraction from Deep Web
AU - Yamada, Yasuhiro
AU - Nakatoh, Tetsuya
AU - Craswell, Nick
AU - Hirokawa, Sachio
PY - 2004
Y1 - 2004
N2 - Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1], We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.
AB - Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1], We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.
UR - http://www.scopus.com/inward/record.url?scp=19944381223&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=19944381223&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:19944381223
SN - 158113844X
T3 - Thirteenth International World Wide Web Conference Proceedings, WWW2004
SP - 1078
EP - 1079
BT - Thirteenth International World Wide Web Conference Proceedings, WWW2004
T2 - Thirteenth International World Wide Web Conference Proceedings, WWW2004
Y2 - 17 May 2004 through 22 May 2004
ER -