TY - GEN
T1 - Faster subsequence and don't-care pattern matching on compressed texts
AU - Yamamoto, Takanori
AU - Bannai, Hideo
AU - Inenaga, Shunsuke
AU - Takeda, Masayuki
PY - 2011
Y1 - 2011
N2 - Subsequence pattern matching problems on compressed text were first considered by Cégielski et al. (Window Subsequence Problems for Compressed Texts, Proc. CSR 2006, LNCS 3967, pp. 127-136), where the principal problem is: given a string T represented as a straight line program (SLP) of size n, a string P of size m, compute the number of minimal subsequence occurrences of P in T. We present an O(nm) time algorithm for solving all variations of the problem introduced by Cégielski et al.. This improves the previous best known algorithm of Tiskin (Towards approximate matching in compressed strings: Local subsequence recognition, Proc. CSR 2011), which runs in O(nmlogm) time. We further show that our algorithms can be modified to solve a wider range of problems in the same O(nm) time complexity, and present the first matching algorithms for patterns containing VLDC (variable length don't care) symbols, as well as for patterns containing FLDC (fixed length don't care) symbols, on SLP compressed texts.
AB - Subsequence pattern matching problems on compressed text were first considered by Cégielski et al. (Window Subsequence Problems for Compressed Texts, Proc. CSR 2006, LNCS 3967, pp. 127-136), where the principal problem is: given a string T represented as a straight line program (SLP) of size n, a string P of size m, compute the number of minimal subsequence occurrences of P in T. We present an O(nm) time algorithm for solving all variations of the problem introduced by Cégielski et al.. This improves the previous best known algorithm of Tiskin (Towards approximate matching in compressed strings: Local subsequence recognition, Proc. CSR 2011), which runs in O(nmlogm) time. We further show that our algorithms can be modified to solve a wider range of problems in the same O(nm) time complexity, and present the first matching algorithms for patterns containing VLDC (variable length don't care) symbols, as well as for patterns containing FLDC (fixed length don't care) symbols, on SLP compressed texts.
UR - http://www.scopus.com/inward/record.url?scp=79960081284&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79960081284&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-21458-5_27
DO - 10.1007/978-3-642-21458-5_27
M3 - Conference contribution
AN - SCOPUS:79960081284
SN - 9783642214578
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 309
EP - 322
BT - Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings
T2 - 22nd Annual Symposium on Combinatorial Pattern Matching, CPM 2011
Y2 - 27 June 2011 through 29 June 2011
ER -