Predicting activities without computing descriptors: graph machines for QSAR

Author: Goulon A.  

Publisher: Taylor & Francis Ltd

ISSN: 1062-936X

Source: SAR and QSAR in Environmental Research, Vol.18, Iss.1-2, 2007-01, pp. : 141-153

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

We describe graph machines, an alternative approach to traditional machine-learning-based QSAR, which circumvents the problem of designing, computing and selecting molecular descriptors. In that approach, which is similar in spirit to recursive networks, molecules are considered as structured data, represented as graphs. For each example of the data set, a mathematical function (graph machine) is built, whose structure reflects the structure of the molecule under consideration; it is the combination of identical parameterised functions, called "node functions" (e.g. a feedforward neural network). The parameters of the node functions, shared both within and across the graph machines, are adjusted during training with the "shared weights" technique. Model selection is then performed by traditional cross-validation. Therefore, the designer's main task consists in finding the optimal complexity for the node function. The efficiency of this new approach has been demonstrated in many QSAR or QSPR tasks, as well as in modelling the activities of complex chemicals (e.g. the toxicity of a family of phenols or the anti-HIV activities of HEPT derivatives). It generally outperforms traditional techniques without requiring the selection and computation of descriptors.