Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Proteins may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic feature of this kind are particularly interesting because they may have some very special biological functions intriguing to investigators in both basic research and drug discovery. For instance, among the 6408 human protein entries that have experimentally observed subcellular location annotations in the Swiss-Prot database (version 50.7, released 19-Sept-2006), 973 ( approximately 15%) have multiple location sites. The number of total human protein entries (except those annotated with ”fragment” or those with less than 50 amino acids) in the same database is 14,370, meaning a gap of (14,370-6408)=7962 entries for which no knowledge is available about their subcellular locations. Although one can use the computational approach to predict the desired information for the gap, so far all the existing methods for predicting human protein subcellular localization are limited in the case of single location site only. To overcome such a barrier, a new ensemble classifier, named Hum-mPLoc, was developed that can be used to deal with the case of multiple location sites as well. Hum-mPLoc is freely accessible to the public as a web server at Meanwhile, for the convenience of people working in the relevant areas, Hum-mPLoc has been used to identify all human protein entries in the Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results thus obtained have been deposited in a downloadable file prepared with Microsoft Excel and named ”Tab_Hum-mPLoc.xls”. This file is available at the same website and will be updated twice a year to include new entries of human proteins and reflect the continuous development of Hum-mPLoc.

References in zbMATH (referenced in 26 articles )

Showing results 1 to 20 of 26.
Sorted by year (citations)

1 2 next

  1. Shen, Yinan; Tang, Jijun; Guo, Fei: Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou’s general PseAAC (2019)
  2. Tarafder, Sumit; Toukir Ahmed, Md.; Iqbal, Sumaiya; Tamjidul Hoque, Md; Sohel Rahman, M.: RBSURFpred: modeling protein accessible surface area in real and binary space using regularized and optimized regression (2018)
  3. Zhang, Shengli; Duan, Xin: Prediction of protein subcellular localization with oversampling approach and Chou’s general PseAAC (2018)
  4. Georgiou, D. N.; Karakasidis, T. E.; Megaritis, A. C.; Nieto, Juan J.; Torres, A.: An extension of fuzzy topological approach for comparison of genetic sequences (2015)
  5. Huang, Chao; Yuan, Jing-Qi: Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou’s pseudo amino acid compositions (2013)
  6. Mei, Suyu: Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning (2012)
  7. Mei, Suyu: Multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization (2012)
  8. Zhang, Yongqing; Zhang, Danling; Mi, Gang; Ma, Daichuan; Li, Gongbing; Guo, Yanzhi; Li, Menglong; Zhu, Min: Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions (2012)
  9. Chou, Kuo-Chen: Some remarks on protein attribute prediction and pseudo amino acid composition (2011)
  10. Lin, Hao; Ding, Hui: Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition (2011)
  11. Mei, S.; Wang, F.; Zhou, S.: Gene ontology based transfer learning for protein subcellular localization (2011) ioport
  12. Zhang, Ya-Nan; Pan, Xiao-Yong; Huang, Yan; Shen, Hong-Bin: Adaptive compressive learning for prediction of protein-protein interactions from primary sequence (2011)
  13. Georgiou, D. N.; Karakasidis, T. E.; Nieto, Juan J.; Torres, A.: A study of entropy/clarity of genetic sequences using metric spaces and fuzzy sets (2010)
  14. Huang, Chen; Zhang, Ruijie; Chen, Zhiqiang; Jiang, Yongshuai; Shang, Zhenwei; Sun, Peng; Zhang, Xuehong; Li, Xia: Predict potential drug targets from the ion channel proteins based on SVM (2010)
  15. Ji, Guoli; Wu, Xiaohui; Shen, Yingjia; Huang, Jiangyin; Quinn Li, Qingshun: A classification-based prediction model of messenger RNA polyadenylation sites (2010)
  16. Shen, Hong-Bin; Chou, Kuo-Chen: Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins (2010)
  17. Blum, Torsten; Briesemeister, Sebastian; Kohlbacher, Oliver: Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction (2009) ioport
  18. Brown, J. B.; Akutsu, Tatsuya: Identification of novel DNA repair proteins via primary sequence, secondary structure, and homology (2009) ioport
  19. Du, Pufeng; Cao, Shengjiao; Li, Yanda: SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic (K)-nearest neighbor (ET-KNN) algorithm (2009)
  20. Georgiou, D. N.; Karakasidis, T. E.; Nieto, J. J.; Torres, A.: Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition (2009)

1 2 next