CRYSTALP2: sequence-based protein crystallization propensity prediction. Background: Current protocols yield crystals for <30% of known proteins, indicating that automatically identifying crystallizable proteins may improve high-throughput structural genomics efforts. We introduce CRYSTALP2, a kernel-based method that predicts the propensity of a given protein sequence to produce diffraction-quality crystals. This method utilizes the composition and collocation of amino acids, isoelectric point, and hydrophobicity, as estimated from the primary sequence, to generate predictions. CRYSTALP2 extends its predecessor, CRYSTALP, by enabling predictions for sequences of unrestricted size and provides improved prediction quality. Results: A significant majority of the collocations used by CRYSTALP2 include residues with high conformational entropy, or low entropy and high potential to mediate crystal contacts; notably, such residues are utilized by surface entropy reduction methods. We show that the collocations provide complementary information to the hydrophobicity and isoelectric point. Tests on four datasets show that CRYSTALP2 outperforms several existing sequence-based predictors (CRYSTALP, OB-score, and SECRET). CRYSTALP2’s accuracy, MCC, and AROC range between 69.3 and 77.5%, 0.39 and 0.55, and 0.72 and 0.79, respectively. Our predictions are similar in quality and are complementary to the predictions of the most recent ParCrys and XtalPred methods. Our results also suggest that, as work in protein crystallization continues (thereby enlarging the population of proteins with known crystallization propensities), the prediction quality of the CRYSTALP2 method should increase. The prediction model and the datasets used in this contribution can be downloaded from http://biomine.ece.ualberta.ca/CRYSTALP2/CRYSTALP2.html. Conclusion: CRYSTALP2 provides relatively accurate crystallization propensity predictions for a given protein chain that either outperform or complement the existing approaches. The proposed method can be used to support current efforts towards improving the success rate in obtaining diffraction-quality crystals.
Keywords for this software
References in zbMATH (referenced in 3 articles )
Showing results 1 to 3 of 3.
- Carugo, Oliviero (ed.); Eisenhaber, Frank (ed.): Data mining techniques for the life sciences (2016)
- Jahandideh, Samad; Mahdavi, Abbas: RFCRYS: sequence-based protein crystallization propensity prediction by means of random forest (2012)
- Zhang, Ya-Nan; Pan, Xiao-Yong; Huang, Yan; Shen, Hong-Bin: Adaptive compressive learning for prediction of protein-protein interactions from primary sequence (2011)