Efficient LZ78 factorization of grammar compressed text

Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size n representing a text S of length N, our algorithm computes the LZ78 factorization of T in time and space, where m is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the term in the time and space complexities becomes either nL, where L is the length of the longest LZ78 factor, or (N∈-∈α) where α∈≥∈0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of S of a certain length. Since m∈=∈O(N/log σ N) where σ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ is constant, and can be more efficient when the text is compressible, i.e. when m and n are small.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Proceedings
PublisherSpringer Verlag
Pages86-98
Number of pages13
ISBN (Print)9783642341083
DOIs
Publication statusPublished - 2012
Event19th International Symposium on String Processing and Information Retrieval, SPIRE 2012 - Cartagena de Indias, Colombia
Duration: Oct 21 2012Oct 25 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7608 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other19th International Symposium on String Processing and Information Retrieval, SPIRE 2012
Country/TerritoryColombia
CityCartagena de Indias
Period10/21/1210/25/12

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Efficient LZ78 factorization of grammar compressed text'. Together they form a unique fingerprint.

Cite this