Computing q-gram non-overlapping frequencies on SLP compressed texts

Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Length-q substrings, or q-grams, can represent important characteristics of text data, and determining the frequencies of all q-grams contained in the data is an important problem with many applications in the field of data mining and machine learning. In this paper, we consider the problem of calculating the non-overlapping frequencies of all q-grams in a text given in compressed form, namely, as a straight line program (SLP). We show that the problem can be solved in O(q 2 n) time and O(qn) space where n is the size of the SLP. This generalizes and greatly improves previous work (Inenaga & Bannai, 2009) which solved the problem only for q = 2 in O(n 4logn) time and O(n 3) space.

Original languageEnglish
Title of host publicationSOFSEM 2012
Subtitle of host publicationTheory and Practice of Computer Science - 38th Conference on Current Trends in Theory and Practice of Computer Science, Proceedings
Pages301-312
Number of pages12
DOIs
Publication statusPublished - 2012
Event38th Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2012 - Spindleruv Mlyn, Czech Republic
Duration: Jan 21 2012Jan 27 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7147 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other38th Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2012
Country/TerritoryCzech Republic
CitySpindleruv Mlyn
Period1/21/121/27/12

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Computing q-gram non-overlapping frequencies on SLP compressed texts'. Together they form a unique fingerprint.

Cite this