Linear-size CDAWG: New repetition-aware indexing and grammar compression

Takuya Takagi, Keisuke Goto, Yuta Fujishige, Shunsuke Inenaga, Hiroki Arimura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)


In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with O(ẽT log n) bits of space allowing for O(log n) -time random and O(1)-time sequential accesses to edge labels, and O(m log σ + occ) -time pattern matching. Here, ẽT is the number of all extensions of maximal repeats in T, n and m are respectively the lengths of the text T and a given pattern, σ is the alphabet size, and occ is the number of occurrences of the pattern in T. The repetitiveness measure ẽT is known to be much smaller than the text length n for highly repetitive text. For constant alphabets, our L-CDAWGs achieve O(m + occ ) pattern matching time with O(eTr log n) bits of space, which improves the pattern matching time of Belazzougui et al.’s run-length BWT-CDAWGs by a factor of log log n, with the same space complexity. Here, eTr is the number of right extensions of maximal repeats in T. As a byproduct, our result gives a way of constructing a straight-line program (SLP) of size O(ẽT) for a given text T in O(n + ẽT log σ) time.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 24th International Symposium, SPIRE 2017, Proceedings
EditorsRossano Venturini, Gabriele Fici, Marinella Sciortino
PublisherSpringer Verlag
Number of pages13
ISBN (Print)9783319674278
Publication statusPublished - 2017
Event24th International Symposium on String Processing and Information Retrieval, SPIRE 2017 - Palermo, Italy
Duration: Sept 26 2017Sept 29 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10508 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other24th International Symposium on String Processing and Information Retrieval, SPIRE 2017

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Linear-size CDAWG: New repetition-aware indexing and grammar compression'. Together they form a unique fingerprint.

Cite this