Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Predicting subcellular localization of human proteins is a challenging problem, especially when unknown query proteins do not have significant homology to proteins of known subcellular locations and when more locations need to be covered. To tackle the challenge, protein samples are expressed by hybridizing the gene ontology (GO) database and amphiphilic pseudo amino acid composition (PseAA). Based on such a representation frame, a novel ensemble classifier, called ”Hum-PLoc”, was developed by fusing many basic individual classifiers through a voting system. The ”engine” of these basic classifiers was operated by the KNN (K-nearest neighbor) rule. As a demonstration, tests were performed with the ensemble classifier for human proteins among the following 12 locations: (1) centriole; (2) cytoplasm; (3) cytoskeleton; (4) endoplasmic reticulum; (5) extracell; (6) Golgi apparatus; (7) lysosome; (8) microsome; (9) mitochondrion; (10) nucleus; (11) peroxisome; (12) plasma membrane. To get rid of redundancy and homology bias, none of the proteins investigated here had > or = 25% sequence identity to any other in a same subcellular location. The overall success rates thus obtained via the jackknife cross-validation test and independent dataset test were 81.1% and 85.0%, respectively, which are more than 50% higher than those obtained by the other existing methods on the same stringent datasets. Furthermore, an incisive and compelling analysis was given to elucidate that the overwhelmingly high success rate obtained by the new predictor is by no means due to a trivial utilization of the GO annotations. This is because, for those proteins with ”subcellular location unknown” annotation in Swiss-Prot database, most (more than 99%) of their corresponding GO numbers in GO database are also annotated with ”cellular component unknown”. The information and clues for predicting subcellular locations of proteins are actually buried into a series of tedious GO numbers, just like they are buried into a pile of complicated amino acid sequences although with a different manner and ”depth”. To dig out the knowledge about their locations, a sophisticated operation engine is needed. And the current predictor is one of these kinds, and has proved to be a very powerful one. The Hum-PLoc classifier is available as a web-server at

References in zbMATH (referenced in 24 articles )

Showing results 1 to 20 of 24.
Sorted by year (citations)

1 2 next

  1. Ahmad, Jamal; Hayat, Maqsood: MFSC: multi-voting based feature selection for classification of Golgi proteins by adopting the general form of Chou’s PseAAC components (2019)
  2. Khan, Muslim; Hayat, Maqsood; Khan, Sher Afzal; Ahmad, Saeed; Iqbal, Nadeem: Bi-PSSM: position specific scoring matrix based intelligent computational model for identification of mycobacterial membrane proteins (2017)
  3. Khan, Zaheer Ullah; Hayat, Maqsood; Khan, Muazzam Ali: Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model (2015)
  4. Zhao, Xiaowei; Zhang, Jian; Ning, Qiao; Sun, Pingping; Ma, Zhiqiang; Yin, Minghao: Identification of protein pupylation sites using bi-profile Bayes feature extraction and ensemble learning (2013) ioport
  5. Hayat, Maqsood; Khan, Asifullah: MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM (2012)
  6. Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui: Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection (2012)
  7. Mei, Suyu: Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning (2012)
  8. Mei, Suyu: Multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization (2012)
  9. Chou, Kuo-Chen: Some remarks on protein attribute prediction and pseudo amino acid composition (2011)
  10. Khan, Asifullah; Majid, Abdul; Hayat, Maqsood: CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition (2011)
  11. Mei, S.; Wang, F.; Zhou, S.: Gene ontology based transfer learning for protein subcellular localization (2011) ioport
  12. Zhang, Ya-Nan; Pan, Xiao-Yong; Huang, Yan; Shen, Hong-Bin: Adaptive compressive learning for prediction of protein-protein interactions from primary sequence (2011)
  13. Esmaeili, Maryam; Mohabatkar, Hassan; Mohsenzadeh, Sasan: Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses (2010)
  14. Anand, Ashish; Suganthan, P. N.: Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates (2009)
  15. Du, Pufeng; Cao, Shengjiao; Li, Yanda: SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic (K)-nearest neighbor (ET-KNN) algorithm (2009)
  16. Perez-Bello, Alcides; Munteanu, Cristian Robert; Ubeira, Florencio M.; Lopes De Magalhães, Alexandre; Uriarte, Eugenio; González-Díaz, Humberto: Alignment-free prediction of mycobacterial DNA promoters based on pseudo-folding lattice network or star-graph topological indices (2009)
  17. Yang, Jian-Yi; Peng, Zhen-Ling; Yu, Zu-Guo; Zhang, Rui-Jie; Anh, Vo; Wang, Desheng: Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation (2009)
  18. Zhang, Li; Liao, Bo; Li, Dachao; Zhu, Wen: A novel representation for apoptosis protein subcellular localization prediction using support vector machine (2009)
  19. Lin, Hao: The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition (2008)
  20. Chen, Ying-Li; Li, Qian-Zhong: Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition (2007)

1 2 next