

Author: Tanabe K.
Publisher: Taylor & Francis Ltd
ISSN: 1062-936X
Source: SAR and QSAR in Environmental Research, Vol.24, Iss.7, 2013-07, pp. : 565-580
Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.
Abstract
A new sensitivity analysis (SA) method for variable selection in support vector machine (SVM) was proposed to improve the performance level of the QSAR model to predict carcinogenicity based on the correlation coefficient (CC) method used in our preceding study. The performances of both methods were also compared with that of the F-score (FS) method proposed by Chang and Lin. The 911 non-congeneric chemicals were classified into 20 mutually overlapping groups according to contained substructures, and a specific SVM model created on chemicals belonging to each group was optimized by searching the best set of SVM parameters while successively omitting descriptors of lower absolute values of sensitivity, CC or FS until the maximum predictive performance was obtained. The SA method improves the overall accuracy from 80% of CC and FS to 84%, which is considerably higher than those of existing models for predicting the carcinogenicity of non-congeneric chemicals. It selects the optimum sets of effective descriptors fewer than the CC and FS methods, and is not time-consuming and can be applied to a large set of initial descriptors. It is concluded that SA is superior as a variable selection method in SVM models.
Related content






By Arena V.C.
SAR and QSAR in Environmental Research, Vol. 15, Iss. 1, 2004-02 ,pp. :




MATEC Web of conference, Vol. 67, Iss. issue, 2016-07 ,pp. :