TY - GEN
T1 - Computing q-gram non-overlapping frequencies on SLP compressed texts
AU - Goto, Keisuke
AU - Bannai, Hideo
AU - Inenaga, Shunsuke
AU - Takeda, Masayuki
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2012
Y1 - 2012
N2 - Length-q substrings, or q-grams, can represent important characteristics of text data, and determining the frequencies of all q-grams contained in the data is an important problem with many applications in the field of data mining and machine learning. In this paper, we consider the problem of calculating the non-overlapping frequencies of all q-grams in a text given in compressed form, namely, as a straight line program (SLP). We show that the problem can be solved in O(q 2 n) time and O(qn) space where n is the size of the SLP. This generalizes and greatly improves previous work (Inenaga & Bannai, 2009) which solved the problem only for q = 2 in O(n 4logn) time and O(n 3) space.
AB - Length-q substrings, or q-grams, can represent important characteristics of text data, and determining the frequencies of all q-grams contained in the data is an important problem with many applications in the field of data mining and machine learning. In this paper, we consider the problem of calculating the non-overlapping frequencies of all q-grams in a text given in compressed form, namely, as a straight line program (SLP). We show that the problem can be solved in O(q 2 n) time and O(qn) space where n is the size of the SLP. This generalizes and greatly improves previous work (Inenaga & Bannai, 2009) which solved the problem only for q = 2 in O(n 4logn) time and O(n 3) space.
UR - http://www.scopus.com/inward/record.url?scp=84856047503&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84856047503&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-27660-6_25
DO - 10.1007/978-3-642-27660-6_25
M3 - Conference contribution
AN - SCOPUS:84856047503
SN - 9783642276590
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 301
EP - 312
BT - SOFSEM 2012
T2 - 38th Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2012
Y2 - 21 January 2012 through 27 January 2012
ER -