iDNA-Prot

iDNA-Prot: identification of DNA binding proteins using random forest with grey model. DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the ”grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has ≥25% pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results.


References in zbMATH (referenced in 16 articles )

Showing results 1 to 16 of 16.
Sorted by year (citations)

  1. Adilina, Sheikh; Farid, Dewan Md; Shatabda, Swakkhar: Effective DNA binding protein prediction by using key features via Chou’s general PseAAC (2019)
  2. Hussain, Waqar; Khan, Yaser Daanial; Rasool, Nouman; Khan, Sher Afzal; Chou, Kuo-Chen: SPrenylC-PseAAC: a sequence-based model developed via Chou’s 5-steps rule and general PseAAC for identifying S-prenylation sites in proteins (2019)
  3. Jia, Jianhua; Li, Xiaoyan; Qiu, Wangren; Xiao, Xuan; Chou, Kuo-Chen: iPPI-PseAAC(CGR): identify protein-protein interactions by incorporating chaos game representation into PseAAC (2019)
  4. Khan, Yaser Daanial; Jamil, Mehreen; Hussain, Waqar; Rasool, Nouman; Khan, Sher Afzal; Chou, Kuo-Chen: pSSbond-PseAAC: prediction of disulfide bonding sites by integration of PseAAC and statistical moments (2019)
  5. Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen: pLoc_bal-mGneg: predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC (2018)
  6. Mei, Juan; Fu, Yi; Zhao, Ji: Analysis and prediction of ion channel inhibitors by using feature selection and Chou’s general pseudo amino acid composition (2018)
  7. Sabooh, M. Fazli; Iqbal, Nadeem; Khan, Mukhtaj; Khan, Muslim; Maqbool, H. F.: Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC (2018)
  8. Jiao, Xiong; Ranganathan, Shoba: Prediction of interface residue based on the features of residue interaction network (2017)
  9. Ali, Farman; Hayat, Maqsood: Machine learning approaches for discrimination of extracellular matrix proteins using hybrid feature space (2016)
  10. Jia, Jianhua; Liu, Zi; Xiao, Xuan; Liu, Bingxiang; Chou, Kuo-Chen: pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach (2016)
  11. Niu, Xiao-Hui; Hu, Xue-Hai; Shi, Feng; Xia, Jing-Bo: Predicting DNA binding proteins using support vector machine with hybrid fractal features (2014)
  12. Yu, Chenglong; Deng, Mo; Cheng, Shiu-Yuen; Yau, Shek-Chung; He, Rong L.; Yau, Stephen S.-T.: Protein space: a natural method for realizing the nature of protein universe (2013)
  13. Jahandideh, Samad; Mahdavi, Abbas: RFCRYS: sequence-based protein crystallization propensity prediction by means of random forest (2012)
  14. Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui: Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection (2012)
  15. Mei, Suyu: Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning (2012)
  16. Mishra, Pooja; Nath Pandey, Paras: Elman RNN based classification of proteins sequences on account of their mutual information (2012)