Finding optimal degenerate patterns in DNA sequences

Daisuke Shinozaki, Tatsuya Akutsu, Osamu Maruyama

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)


Motivation: The problem of finding transcription factor binding sites in the upstream regions of given genes is algorithmically an interesting and challenging problem in computational biology. A degenerate pattern over a finite alphabet ∑ is a sequence of subsets of ∑. A string over IUPAC nucleic acid codes is also a degenerate pattern over ∑ = {A, C, G, T}, and is used as one of the major patterns modeling transcription factor binding sites in the upstream regions of genes. However, it is known that the problem of finding a degenerate pattern consistent with both positive and negative string sets is in general NP-complete. Our aim is to devise a heuristic algorithm to find a degenerate pattern which is optimal for positive and negative string sets w.r.t. a given score function. Results: We have proposed an enumerative algorithm called SUPERPOSITION for finding optimal degenerate patterns with a pruning technique, which works with most all reasonable score functions. The performance score of the algorithm has been compared with those of other popular motif-finding algorithms YMF, MEME and AlignACE on various sets of co-regulated genes of yeast. In the computational experiment, SUPERPOSITION has outperformed the others on several gene sets. Availability: The python script SUPERPOSITION is available at

Original languageEnglish
Pages (from-to)ii206-ii214
Issue numberSUPPL. 2
Publication statusPublished - 2003

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics


Dive into the research topics of 'Finding optimal degenerate patterns in DNA sequences'. Together they form a unique fingerprint.

Cite this