GeneScout

GeneScout: a data mining system for predicting vertebrate genes in genomic DNA sequences. Automated detection or prediction of coding sequences from within genomic DNA has been a major rate-limiting step in the pursuit of vertebrate genes. Programs currently available are far from being powerful enough to elucidate a gene structure completely. In this paper, we present a new system, called GeneScout, for predicting gene structures in vertebrate genomic DNA. The system contains specially designed hidden Markov models (HMMs) for detecting functional sites including protein-translation start sites, mRNA splicing junction donor and acceptor sites, etc. An HMM model is also proposed for exon coding potential computation. Our main hypothesis is that, given a vertebrate genomic DNA sequence S, it is always possible to construct a directed acyclic graph G such that the path for the actual coding region of S is in the set of all paths on G. Thus, the gene detection problem is reduced to that of analyzing the paths in the graph G. A dynamic programming algorithm is used to find the optimal path in G. The proposed system is trained using an expectation-maximization algorithm and its performance on vertebrate gene prediction is evaluated using the 10-way cross-validation method. Experimental results show that the proposed system performs well and is comparable to existing gene discovery tools.


References in zbMATH (referenced in 7 articles )

Showing results 1 to 7 of 7.
Sorted by year (citations)

  1. Canan Has; Jens Allmer: PGMiner: Complete proteogenomics workflow; from data acquisition to result visualization (2017) not zbMATH
  2. Jiang, Zhu; Huang, Yong-Xuan: Parametric calibration of speed-density relationships in mesoscopic traffic simulator with data mining (2009) ioport
  3. Zhong, Sheng; Yang, Zhiqiang; Chen, Tingting: (k)-anonymous data collection (2009)
  4. Hsu, Chung-Chian; Chen, Chin-Long; Su, Yu-Wei: Hierarchical clustering of mixed data based on distance hierarchy (2007) ioport
  5. Shah, Divyesh; Zhong, Sheng: Two methods for privacy preserving data mining with malicious participants (2007)
  6. Zhong, Sheng: Privacy-preserving algorithms for distributed mining of frequent itemsets (2007)
  7. Yin, Michael M.; Wang, Jason T. L.: GeneScout: a data mining system for predicting vertebrate genes in genomic DNA sequences (2004) ioport