TY - GEN
T1 - Bit Catastrophes for the Burrows-Wheeler Transform
AU - Giuliani, Sara
AU - Inenaga, Shunsuke
AU - Lipták, Zsuzsanna
AU - Romana, Giuseppe
AU - Sciortino, Marinella
AU - Urbina, Cristian
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Θ(log n), where n is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive Θ(log n) -factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O(log nlog r), since here r= O(1 ). We then give examples of strings in which insertion, deletion, resp. substitution of a character increases r by a Θ(n) additive factor. These strings significantly improve the best known lower bound for an additive factor of Ω(log n) [Giuliani et al., SOFSEM 2021].
AB - A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Θ(log n), where n is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive Θ(log n) -factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O(log nlog r), since here r= O(1 ). We then give examples of strings in which insertion, deletion, resp. substitution of a character increases r by a Θ(n) additive factor. These strings significantly improve the best known lower bound for an additive factor of Ω(log n) [Giuliani et al., SOFSEM 2021].
KW - Burrows-Wheeler transform
KW - Equal-letter run
KW - Repetitiveness measure
KW - Sensitivity
UR - http://www.scopus.com/inward/record.url?scp=85161240535&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85161240535&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-33264-7_8
DO - 10.1007/978-3-031-33264-7_8
M3 - Conference contribution
AN - SCOPUS:85161240535
SN - 9783031332630
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 86
EP - 99
BT - Developments in Language Theory - 27th International Conference, DLT 2023, Proceedings
A2 - Drewes, Frank
A2 - Volkov, Mikhail
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Developments in Language Theory, DLT 2023
Y2 - 12 June 2023 through 16 June 2023
ER -