Tight bounds on the maximum number of shortest unique substrings

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs for interval [s, t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s ≤ t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

Original languageEnglish
Title of host publication28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017
EditorsJakub Radoszewski, Juha Karkkainen, Jakub Radoszewski, Wojciech Rytter
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959770392
DOIs
Publication statusPublished - Jul 1 2017
Event28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017 - Warsaw, Poland
Duration: Jul 4 2017Jul 6 2017

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume78
ISSN (Print)1868-8969

Other

Other28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017
Country/TerritoryPoland
CityWarsaw
Period7/4/177/6/17

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'Tight bounds on the maximum number of shortest unique substrings'. Together they form a unique fingerprint.

Cite this