Prediction of Mutagenicity of Chemicals from Their Calculated Molecular Descriptors: A Case Study with Structurally Homogeneous versus Diverse Datasets

Publisher: Bentham Science Publishers

E-ISSN: 1875-6697|11|2|117-123

ISSN: 1573-4099

Source: Current Computer - Aided Drug Design, Vol.11, Iss.2, 2015-09, pp. : 117-123

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

Variation in high-dimensional data is often caused by a few latent factors, and hencedimension reduction or variable selection techniques are often useful in gathering useful informationfrom the data. In this paper we consider two such recent methods: Interrelated two-way clustering andenvelope models. We couple these methods with traditional statistical procedures like ridge regressionand linear discriminant analysis, and apply them on two data sets which have more predictors than samples (i.e. n lt< pscenario) and several types of molecular descriptors. One of these datasets consists of a congeneric group of Amines whilethe other has a much diverse collection compounds. The difference of prediction results between these two datasets forboth the methods supports the hypothesis that for a congeneric set of compounds, descriptors of a certain type are enoughto provide good QSAR models, but as the data set grows diverse including a variety of descriptors can improve modelquality considerably.