TY - GEN
T1 - Minimal Absent Words on Run-Length Encoded Strings
AU - Akagi, Tooru
AU - Okabe, Kouta
AU - Mieno, Takuya
AU - Nakashima, Yuto
AU - Inenaga, Shunsuke
N1 - Publisher Copyright:
© Tooru Akagi, Kouta Okabe, Takuya Mieno, Yuto Nakashima, and Shunsuke Inenaga; licensed under Creative Commons License CC-BY 4.0
PY - 2022/6/1
Y1 - 2022/6/1
N2 - A string w is called a minimal absent word for another string T if w does not occur (as a substring) in T and all proper substrings of w occur in T. State-of-the-art data structures for reporting the set MAW(T) of MAWs from a given string T of length n require O(n) space, can be built in O(n) time, and can report all MAWs in O(|MAW(T)|) time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters a by ap where p is the length of the run. Let m be the RLE-size of string T. After categorizing the MAWs into five disjoint sets M1, M2, M3, M4, M5 using RLE, we present matching upper and lower bounds for the number of MAWs in Mi for i = 1, 2, 4, 5 in terms of RLE-size m, except for M3 whose size is unbounded by m. We then present a compact O(m)-space data structure that can report all MAWs in optimal O(|MAW(T)|) time.
AB - A string w is called a minimal absent word for another string T if w does not occur (as a substring) in T and all proper substrings of w occur in T. State-of-the-art data structures for reporting the set MAW(T) of MAWs from a given string T of length n require O(n) space, can be built in O(n) time, and can report all MAWs in O(|MAW(T)|) time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters a by ap where p is the length of the run. Let m be the RLE-size of string T. After categorizing the MAWs into five disjoint sets M1, M2, M3, M4, M5 using RLE, we present matching upper and lower bounds for the number of MAWs in Mi for i = 1, 2, 4, 5 in terms of RLE-size m, except for M3 whose size is unbounded by m. We then present a compact O(m)-space data structure that can report all MAWs in optimal O(|MAW(T)|) time.
KW - combinatorics on words
KW - minimal absent words
KW - run-length encoding
KW - string algorithms
UR - http://www.scopus.com/inward/record.url?scp=85134322387&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134322387&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.CPM.2022.27
DO - 10.4230/LIPIcs.CPM.2022.27
M3 - Conference contribution
AN - SCOPUS:85134322387
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022
A2 - Bannai, Hideo
A2 - Holub, Jan
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022
Y2 - 27 June 2022 through 29 June 2022
ER -