Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. MOTIVATION: The discovery of motifs in biological sequences is an important problem. RESULTS: This paper presents a new algorithm for the discovery of rigid patterns (motifs) in biological sequences. Our method is combinatorial in nature and able to produce all patterns that appear in at least a (user-defined) minimum number of sequences, yet it manages to be very efficient by avoiding the enumeration of the entire pattern space. Furthermore, the reported patterns are maximal: any reported pattern cannot be made more specific and still keep on appearing at the exact same positions within the input sequences. The effectiveness of the proposed approach is showcased on a number of test cases which aim to: (i) validate the approach through the discovery of previously reported patterns; (ii) demonstrate the capability to identify automatically highly selective patterns particular to the sequences under consideration. Finally, experimental analysis indicates that the algorithm is output sensitive, i.e. its running time is quasi-linear to the size of the generated output.

References in zbMATH (referenced in 16 articles )

Showing results 1 to 16 of 16.
Sorted by year (citations)

  1. Hämäläinen, Wilhelmiina; Webb, Geoffrey I.: A tutorial on statistically sound pattern discovery (2019)
  2. O’Donnell, Brian; Maurer, Alexander; Papandreou-Suppappola, Antonia: Biosequence time-frequency processing: pathogen detection and identification (2015)
  3. Song, Tao; Gu, Hong: Discovering short linear protein motif based on selective training of profile hidden Markov models (2015)
  4. Abbass, Mostafa M.; Bahig, Hazem M.: An efficient algorithm to identify DNA motifs (2013)
  5. Bansal, Nikhil; Lewenstein, Moshe; Ma, Bin; Zhang, Kaizhong: On the longest common rigid subsequence problem (2010)
  6. Hu, Jianjun; Zhang, Fan: Bayesmotif: de novo protein sorting motif discovery from impure datasets (2010) ioport
  7. Ruan, Guangchen; Tan, Ying: A three-layer back-propagation neural network for spam detection using artificial immune concentration (2010) ioport
  8. Exarchos, Themis P.; Tsipouras, Markos G.; Papaloukas, Costas; Fotiadis, Dimitrios I.: An optimized sequential pattern matching methodology for sequence classification (2009) ioport
  9. Dogruel, Mutlu; Down, Thomas A.; Hubbard, Tim J. P.: Nestedmica as an ab initio protein motif discovery tool (2008) ioport
  10. Rokach, Lior; Romano, Roni; Maimon, Oded: Negation recognition in medical narrative reports (2008) ioport
  11. Elloumi, Fathi; Nason, Martha: SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences (2007) ioport
  12. Lonardi, Stefano; Lin, Jessica; Keogh, Eamonn; Chiu, Bill `Yuan-chi’: Efficient discovery of unusual patterns in time series (2007)
  13. Mahony, Shaun; Benos, Panayiotis V.; Smith, Terry J.; Golden, Aaron: Self-organizing neural networks to support the discovery of DNA-binding motifs (2006)
  14. Mahony, Shaun; Hendrix, David; Smith, Terry J.; Golden, Aaron: Self-organizing maps of position weight matrices for motif discovery in biological sequences (2005) ioport
  15. Pelfrêne, Johann; Abdeddaïm, Saïd; Alexandre, Joël: Extracting approximate patterns (2005)
  16. Hernandez, David; Gras, Robin; Appel, Ron: MoDEL: an efficient strategy for ungapped local multiple alignment (2004)