TY - GEN
T1 - Tight bounds on the maximum number of shortest unique substrings
AU - Mieno, Takuya
AU - Inenaga, Shunsuke
AU - Bannai, Hideo
AU - Takeda, Masayuki
N1 - Publisher Copyright:
© Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.
AB - A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.
UR - http://www.scopus.com/inward/record.url?scp=85027271339&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027271339&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.CPM.2017.24
DO - 10.4230/LIPIcs.CPM.2017.24
M3 - Conference contribution
AN - SCOPUS:85027271339
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017
A2 - Radoszewski, Jakub
A2 - Karkkainen, Juha
A2 - Radoszewski, Jakub
A2 - Rytter, Wojciech
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017
Y2 - 4 July 2017 through 6 July 2017
ER -