BaCelLo: a balanced subcellular localization predictor. Motivation. The knowledge of the subcellular localization of a protein is fundamental for elucidating its function. It is difficult to determine the subcellular location for eukaryotic cells with experimental high-throughput procedures. Computational procedures are then needed for annotating the subcellular location of proteins in large scale genomic projects. Results. BaCelLo is a predictor for five classes of subcellular localization (secretory pathway, cytoplasm, nucleus, mitochondrion and chloroplast) and it is based on different SVMs organized in a decision tree. The system exploits the information derived from the residue sequence and from the evolutionary information contained in alignment profiles. It analyzes the whole sequence composition and the compositions of both the N- and C-termini. The training set is curated in order to avoid redundancy. For the first time a balancing procedure is introduced in order to mitigate the effect of biased training sets. Three kingdom-specific predictors are implemented: for animals, plants and fungi, respectively. When distributing the proteins from animals and fungi into four classes, accuracy of BaCelLo reach 74% and 76%, respectively; a score of 67% is obtained when proteins from plants are distributed into five classes. BaCelLo outperforms the other presently available methods for the same task and gives more balanced accuracy and coverage values for each class. We also predict the subcellular localization of five whole proteomes, Homo sapiens, Mus musculus, Caenorhabditis elegans, Saccharomyces cerevisiae and Arabidopsis thaliana, comparing the protein content in each different compartment. Availability. BaCelLo can be accessed at

References in zbMATH (referenced in 9 articles )

Showing results 1 to 9 of 9.
Sorted by year (citations)

  1. Picardi, Ernesto (ed.): RNA bioinformatics (2015)
  2. Mei, Suyu: \textitSVMensemble based transfer learning for large-scale membrane proteins discrimination (2014)
  3. Mei, Suyu: Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning (2012)
  4. Wang, Yongcui; Ren, Xianwen; Zhang, Chunhua; Deng, Naiyang; Zhang, Xiangsun: Interrogating noise in protein sequences from the perspective of protein-protein interactions prediction (2012)
  5. Mei, S.; Wang, F.; Zhou, S.: Gene ontology based transfer learning for protein subcellular localization (2011) ioport
  6. Hu, Jianjun; Zhang, Fan: Bayesmotif: de novo protein sorting motif discovery from impure datasets (2010) ioport
  7. Blum, Torsten; Briesemeister, Sebastian; Kohlbacher, Oliver: Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction (2009) ioport
  8. Kumar, Manish; Raghava, Gajendra P. S.: Prediction of nuclear proteins using SVM and HMM models (2009) ioport
  9. Xu, Qian; Hu, Derek Hao; Xue, Hong; Yu, Weichuan; Yang, Qiang: Semi-supervised protein subcellular localization (2009) ioport