TY - GEN
T1 - Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph
AU - Arimura, Hiroki
AU - Inenaga, Shunsuke
AU - Kobayashi, Yasuaki
AU - Nakashima, Yuto
AU - Sue, Mizuki
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.
PY - 2023
Y1 - 2023
N2 - In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size e for a text T of length n into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size r, the irreducible PLCP array of size r, and the quasi-irreducible LPF array of size e, as well as the lex-parse of size O(r) and the LZ77-parse of size z, where $$r, z \leqslant e$$. As main results, we showed that the above structures can be optimally computed from either the CDAWG for T stored in read-only memory or its self-index version of size e without a text in O(e) worst-case time and words of working space. To obtain the above results, we devised techniques for enumerating a particular subset of suffixes in the lexicographic and text orders using the forward and backward search on the CDAWG by extending the result by Belazzougui et al. in 2015.
AB - In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size e for a text T of length n into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size r, the irreducible PLCP array of size r, and the quasi-irreducible LPF array of size e, as well as the lex-parse of size O(r) and the LZ77-parse of size z, where $$r, z \leqslant e$$. As main results, we showed that the above structures can be optimally computed from either the CDAWG for T stored in read-only memory or its self-index version of size e without a text in O(e) worst-case time and words of working space. To obtain the above results, we devised techniques for enumerating a particular subset of suffixes in the lexicographic and text orders using the forward and backward search on the CDAWG by extending the result by Belazzougui et al. in 2015.
KW - Highly-repetitive text
KW - longest common prefix
KW - suffix tree
UR - http://www.scopus.com/inward/record.url?scp=85174450924&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174450924&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-43980-3_3
DO - 10.1007/978-3-031-43980-3_3
M3 - Conference contribution
AN - SCOPUS:85174450924
SN - 9783031439797
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 28
EP - 34
BT - String Processing and Information Retrieval - 30th International Symposium, SPIRE 2023, Proceedings
A2 - Nardini, Franco Maria
A2 - Pisanti, Nadia
A2 - Venturini, Rossano
PB - Springer Science and Business Media Deutschland GmbH
T2 - 30th International Symposium on String Processing and Information Retrieval, SPIRE 2023
Y2 - 26 September 2023 through 28 September 2023
ER -