

Author: Kong Xiangjun Lv Fenglin Luo Xiaoli Pan Yuzhu Wu Bin Ren Yonggang Li Yuanchao Yang Qingwu
Publisher: Taylor & Francis Ltd
ISSN: 0892-7022
Source: Molecular Simulation, Vol.37, Iss.3, 2011-03, pp. : 243-249
Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.
Abstract
A new kind of amino acid descriptors we named integrated property scores (IP scores), were derived from 516 physico-chemical properties using the classical principal component analysis technique, and employed to characterise the sequence pattern profile of 162 single-protonated tripeptides. Based upon the sophisticated partial least squares (PLS) regression coupled with genetic algorithm-variable selection, the resulting structural parameters of the characterisation were then used to develop several robust quantitative structure-spectrum relationship models with the ion mobility spectrometry collision cross sections of these peptides. The results for 94.1% samples in the data panel are satisfactorily accurate, compared to those experimentally measured. Subsequently, the predictive power and stability of the constructed models were analysed and tested in detail through both internal and external validations, with the correlation coefficients of fitting r2, cross-validation q2 and prediction [image omitted] of 0.978, 0.963 and 0.970, respectively. Furthermore, comparison of the statistics obtained from linear PLS modelling with that from nonlinear least squares support vector machine reveals that there is significant linear correlation and also modest nonlinear correlation between the IP scores and collision cross sections of peptide cations. We expect that this sequence-based method can be used for modelling and predicting other properties and activities of peptides and proteins as well.
Related content


Sequence-based prediction of protein domains
Nucleic Acids Research, Vol. 32, Iss. 12, 2004-01 ,pp. :





