Improvement of carcinogenicity prediction performances based on sensitivity analysis in variable selection of SVM models

Author: Tanabe K.  

Publisher: Taylor & Francis Ltd

ISSN: 1062-936X

Source: SAR and QSAR in Environmental Research, Vol.24, Iss.7, 2013-07, pp. : 565-580

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

A new sensitivity analysis (SA) method for variable selection in support vector machine (SVM) was proposed to improve the performance level of the QSAR model to predict carcinogenicity based on the correlation coefficient (CC) method used in our preceding study. The performances of both methods were also compared with that of the F-score (FS) method proposed by Chang and Lin. The 911 non-congeneric chemicals were classified into 20 mutually overlapping groups according to contained substructures, and a specific SVM model created on chemicals belonging to each group was optimized by searching the best set of SVM parameters while successively omitting descriptors of lower absolute values of sensitivity, CC or FS until the maximum predictive performance was obtained. The SA method improves the overall accuracy from 80% of CC and FS to 84%, which is considerably higher than those of existing models for predicting the carcinogenicity of non-congeneric chemicals. It selects the optimum sets of effective descriptors fewer than the CC and FS methods, and is not time-consuming and can be applied to a large set of initial descriptors. It is concluded that SA is superior as a variable selection method in SVM models.

Related content