Discovering best variable-length-don’t-care patterns

Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

A variable-length-don’t-care pattern (VLDC pattern) is an element of set Π = (Σ∪{∗}), where Σ is an alphabet and ∗ is a wildcard matching any string in Σ. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by pruning heuristics. We introduce two versions of our algorithm: one employs a pattern matching machine (PMM) whereas the other does an index structure called the Wildcard Directed Acyclic Word Graph (WDAWG). In addition, we consider a more generalized problem of finding the best pair ‹q, k›, where k is the window size that specifies the length of an occurrence of the VLDC pattern q matching a string w. We present three algorithms solving this problem with pruning heuristics, using the dynamic programming (DP), PMMs and WDAWGs, respectively. Although the two problems are NP-hard, we experimentally show that our algorithms run remarkably fast.

Original languageEnglish
Title of host publicationDiscovery Science - 5th International Conference, DS 2002, Proceedings
EditorsSteffen Lange, Ken Satoh, Carl H. Smith
PublisherSpringer Verlag
Pages86-97
Number of pages12
ISBN (Print)3540001883, 9783540001881
DOIs
Publication statusPublished - 2002
Event5th International Conference on Discovery Science, DS 2002 - Lubeck, Germany
Duration: Nov 24 2002Nov 26 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2534
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th International Conference on Discovery Science, DS 2002
Country/TerritoryGermany
CityLubeck
Period11/24/0211/26/02

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Discovering best variable-length-don’t-care patterns'. Together they form a unique fingerprint.

Cite this