Tight bounds for the sensitivity of CDAWGs with left-end edits

Research output: Contribution to journalArticlepeer-review

Abstract

Compact directed acyclic word graphs (CDAWGs) (Blumer et al. in J ACM 34(3):578–595, 1987) are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string T is obtained by merging isomorphic subtrees of the suffix tree (Weiner, in: Proceedings of the 14th annual symposium on switching and automata theory, pp 1–11, 1973) of the same string T, thus CDAWGs are a compact indexing structure. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation (insertion, deletion, or substitution) is performed at the left-end of the input string T, namely, we are interested in the worst-case increase in the size of the CDAWG after a left-end edit operation. We prove that if e is the number of edges of the CDAWG for string T, then the number of new edges added to the CDAWG after a left-end edit operation on T does not exceed e. Further, we present a matching lower bound on the sensitivity of CDAWGs for left-end insertions, and almost matching lower bounds for left-end deletions and substitutions. We then generalize our lower-bound instance for left-end insertions to leftward online construction of the CDAWG, and show that it requires Ω(n2) time for some string of length n.

Original languageEnglish
Article number12
JournalActa Informatica
Volume62
Issue number1
DOIs
Publication statusPublished - Mar 2025

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Tight bounds for the sensitivity of CDAWGs with left-end edits'. Together they form a unique fingerprint.

Cite this