Bit Catastrophes for the Burrows-Wheeler Transform

Sara Giuliani, Shunsuke Inenaga, Zsuzsanna Lipták, Giuseppe Romana, Marinella Sciortino, Cristian Urbina

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Θ(log n), where n is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive Θ(log n) -factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O(log nlog r), since here r= O(1 ). We then give examples of strings in which insertion, deletion, resp. substitution of a character increases r by a Θ(n) additive factor. These strings significantly improve the best known lower bound for an additive factor of Ω(log n) [Giuliani et al., SOFSEM 2021].

Original languageEnglish
Title of host publicationDevelopments in Language Theory - 27th International Conference, DLT 2023, Proceedings
EditorsFrank Drewes, Mikhail Volkov
PublisherSpringer Science and Business Media Deutschland GmbH
Pages86-99
Number of pages14
ISBN (Print)9783031332630
DOIs
Publication statusPublished - 2023
Event27th International Conference on Developments in Language Theory, DLT 2023 - Umeå, Sweden
Duration: Jun 12 2023Jun 16 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13911 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Developments in Language Theory, DLT 2023
Country/TerritorySweden
CityUmeå
Period6/12/236/16/23

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Bit Catastrophes for the Burrows-Wheeler Transform'. Together they form a unique fingerprint.

Cite this