TY - GEN
T1 - Approximating Document Frequency for Self-Index based Top-k Document Retrieval
AU - Suzuki, Tokinori
AU - Fujii, Atsushi
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/4/27
Y1 - 2015/4/27
N2 - Top-k document retrieval, which returns highly relevant documents relative to a query, is an essential task for many applications. One of the promising index frameworks is built by FM-index and wavelet tree for supporting efficient top-k document retrieval. The index, however, has difficulty on handling document frequency (DF) at search time because indexed terms are all substrings of a document collection. Previous works exhaustively search all the parts of the index, where most of the documents are not relevant, for DF calculation or store recalculated DF values in huge additional space. In this paper, we propose two methods to approximate DF of a query term by exploiting the information obtained from the process of traversing the index structures. Experimental results showed that our methods achieved almost equal effectiveness of exhaustive search while keeping search efficiency that time of our methods are about a half of the exhaustive search.
AB - Top-k document retrieval, which returns highly relevant documents relative to a query, is an essential task for many applications. One of the promising index frameworks is built by FM-index and wavelet tree for supporting efficient top-k document retrieval. The index, however, has difficulty on handling document frequency (DF) at search time because indexed terms are all substrings of a document collection. Previous works exhaustively search all the parts of the index, where most of the documents are not relevant, for DF calculation or store recalculated DF values in huge additional space. In this paper, we propose two methods to approximate DF of a query term by exploiting the information obtained from the process of traversing the index structures. Experimental results showed that our methods achieved almost equal effectiveness of exhaustive search while keeping search efficiency that time of our methods are about a half of the exhaustive search.
UR - http://www.scopus.com/inward/record.url?scp=84947746015&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947746015&partnerID=8YFLogxK
U2 - 10.1109/WAINA.2015.68
DO - 10.1109/WAINA.2015.68
M3 - Conference contribution
AN - SCOPUS:84947746015
T3 - Proceedings - IEEE 29th International Conference on Advanced Information Networking and Applications Workshops, WAINA 2015
SP - 541
EP - 546
BT - Proceedings - IEEE 29th International Conference on Advanced Information Networking and Applications Workshops, WAINA 2015
A2 - Barolli, Leonard
A2 - Takizawa, Makoto
A2 - Xhafa, Fatos
A2 - Enokido, Tomoya
A2 - Park, Jong Hyuk
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 29th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2015
Y2 - 25 March 2015 through 27 March 2015
ER -