iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets. Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem’s essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor’s web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved.

References in zbMATH (referenced in 15 articles )

Showing results 1 to 15 of 15.
Sorted by year (citations)

  1. Ahmad, Jamal; Hayat, Maqsood: MFSC: multi-voting based feature selection for classification of Golgi proteins by adopting the general form of Chou’s PseAAC components (2019)
  2. Jia, Jianhua; Li, Xiaoyan; Qiu, Wangren; Xiao, Xuan; Chou, Kuo-Chen: iPPI-PseAAC(CGR): identify protein-protein interactions by incorporating chaos game representation into PseAAC (2019)
  3. Wang, Lidong; Zhang, Ruijun; Mu, Yashuang: Fu-SulfPred: identification of protein S-sulfenylation sites by fusing forests via Chou’s general PseAAC (2019)
  4. Arif, Muhammad; Hayat, Maqsood; Jan, Zahoor: IMem-2LSAAC: a two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into Chou’s pseudo amino acid composition (2018)
  5. Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen: pLoc_bal-mGneg: predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC (2018)
  6. Sabooh, M. Fazli; Iqbal, Nadeem; Khan, Mukhtaj; Khan, Muslim; Maqbool, H. F.: Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC (2018)
  7. Sankari, E. Siva; Manimegalai, D.: Predicting membrane protein types by incorporating a novel feature set into Chou’s general PseAAC (2018)
  8. Tarafder, Sumit; Toukir Ahmed, Md.; Iqbal, Sumaiya; Tamjidul Hoque, Md; Sohel Rahman, M.: RBSURFpred: modeling protein accessible surface area in real and binary space using regularized and optimized regression (2018)
  9. Zhang, Shengli; Duan, Xin: Prediction of protein subcellular localization with oversampling approach and Chou’s general PseAAC (2018)
  10. Jiao, Xiong; Ranganathan, Shoba: Prediction of interface residue based on the features of residue interaction network (2017)
  11. Khan, Muslim; Hayat, Maqsood; Khan, Sher Afzal; Ahmad, Saeed; Iqbal, Nadeem: Bi-PSSM: position specific scoring matrix based intelligent computational model for identification of mycobacterial membrane proteins (2017)
  12. Pai, Priyadarshini P.; Dash, Tirtharaj; Mondal, Sukanta: Sequence-based discrimination of protein-RNA interacting residues using a probabilistic approach (2017)
  13. Jia, Jianhua; Liu, Zi; Xiao, Xuan; Liu, Bingxiang; Chou, Kuo-Chen: pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach (2016)
  14. Jiao, Ya-Sen; Du, Pu-Feng: Prediction of Golgi-resident protein types using general form of Chou’s pseudo-amino acid compositions: approaches with minimal redundancy maximal relevance feature selection (2016)
  15. Yang, Lei; Wang, Shiyuan; Zhou, Meng; Chen, Xiaowen; Zuo, Yongchun; Lv, Yingli: Characterization of BioPlex network by topological properties (2016)