TY - GEN
T1 - Efficient LZ78 factorization of grammar compressed text
AU - Bannai, Hideo
AU - Inenaga, Shunsuke
AU - Takeda, Masayuki
PY - 2012
Y1 - 2012
N2 - We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size n representing a text S of length N, our algorithm computes the LZ78 factorization of T in time and space, where m is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the term in the time and space complexities becomes either nL, where L is the length of the longest LZ78 factor, or (N∈-∈α) where α∈≥∈0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of S of a certain length. Since m∈=∈O(N/log σ N) where σ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ is constant, and can be more efficient when the text is compressible, i.e. when m and n are small.
AB - We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size n representing a text S of length N, our algorithm computes the LZ78 factorization of T in time and space, where m is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the term in the time and space complexities becomes either nL, where L is the length of the longest LZ78 factor, or (N∈-∈α) where α∈≥∈0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of S of a certain length. Since m∈=∈O(N/log σ N) where σ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ is constant, and can be more efficient when the text is compressible, i.e. when m and n are small.
UR - http://www.scopus.com/inward/record.url?scp=84867496904&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867496904&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-34109-0_10
DO - 10.1007/978-3-642-34109-0_10
M3 - Conference contribution
AN - SCOPUS:84867496904
SN - 9783642341083
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 86
EP - 98
BT - String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Proceedings
PB - Springer Verlag
T2 - 19th International Symposium on String Processing and Information Retrieval, SPIRE 2012
Y2 - 21 October 2012 through 25 October 2012
ER -