Simple linear-time off-line text compression by longest-first substitution

Eyosuke Nakamura, Hideo Bannai, Shunsuke Ineriaga, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)


We consider grammar based text compression with longest first substitution where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a new non-terminal symbol. We present a new text compression algorithm by simplifying the algorithm presented in [4]. We give a new formulation of the correctness proof introducing the sparse lazy suffix tree data structure. We also present another type of longest first substitution strategy that allows better compression. We show results of preliminary experiments comparing grammar sizes of the two versions of the longest first strategy and the most frequent strategy.

Original languageEnglish
Title of host publicationProceedings - DCC 2007
Subtitle of host publication2007 Data Compression Conference
Number of pages10
Publication statusPublished - 2007
EventDCC 2007: 2007 Data Compression Conference - Snowbird, UT, United States
Duration: Mar 27 2007Mar 29 2007

Publication series

NameData Compression Conference Proceedings
ISSN (Print)1068-0314


OtherDCC 2007: 2007 Data Compression Conference
Country/TerritoryUnited States
CitySnowbird, UT

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications


Dive into the research topics of 'Simple linear-time off-line text compression by longest-first substitution'. Together they form a unique fingerprint.

Cite this