Approximating Document Frequency for Self-Index based Top-k Document Retrieval

Tokinori Suzuki, Atsushi Fujii

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Top-k document retrieval, which returns highly relevant documents relative to a query, is an essential task for many applications. One of the promising index frameworks is built by FM-index and wavelet tree for supporting efficient top-k document retrieval. The index, however, has difficulty on handling document frequency (DF) at search time because indexed terms are all substrings of a document collection. Previous works exhaustively search all the parts of the index, where most of the documents are not relevant, for DF calculation or store recalculated DF values in huge additional space. In this paper, we propose two methods to approximate DF of a query term by exploiting the information obtained from the process of traversing the index structures. Experimental results showed that our methods achieved almost equal effectiveness of exhaustive search while keeping search efficiency that time of our methods are about a half of the exhaustive search.

Original languageEnglish
Title of host publicationProceedings - IEEE 29th International Conference on Advanced Information Networking and Applications Workshops, WAINA 2015
EditorsLeonard Barolli, Makoto Takizawa, Fatos Xhafa, Tomoya Enokido, Jong Hyuk Park
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages541-546
Number of pages6
ISBN (Electronic)9781479917747
DOIs
Publication statusPublished - Apr 27 2015
Externally publishedYes
Event29th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2015 - Gwangju, Korea, Republic of
Duration: Mar 25 2015Mar 27 2015

Publication series

NameProceedings - IEEE 29th International Conference on Advanced Information Networking and Applications Workshops, WAINA 2015

Other

Other29th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2015
Country/TerritoryKorea, Republic of
CityGwangju
Period3/25/153/27/15

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Approximating Document Frequency for Self-Index based Top-k Document Retrieval'. Together they form a unique fingerprint.

Cite this