A Wrapper-Based Feature Selection Method for ADMET Prediction Using Evolutionary Computing
Wrapper methods look for the selection of a subset of features or variables in a data set, in such a way that these features are the most relevant for predicting a target value. In chemoinformatics context, the determination of the most significant set of
- PDF / 399,885 Bytes
- 12 Pages / 430 x 660 pts Page_size
- 9 Downloads / 177 Views
Laboratorio de Investigación y Desarrollo en Computación Científica (LIDeCC), Departamento de Ciencias e Ingeniería de la Computación (DCIC) Universidad Nacional del Sur – Av. Alem 1253 – 8000 – Bahía Blanca Argentina 2 Planta Piloto de Ingeniería Química (PLAPIQUI) Universidad Nacional del Sur – CONICET Complejo CRIBABB – Camino La Carrindanga km.7 – CC 717 – Bahía Blanca Argentina {saj,rlc,gev,ip}@cs.uns.edu.ar
Abstract. Wrapper methods look for the selection of a subset of features or variables in a data set, in such a way that these features are the most relevant for predicting a target value. In chemoinformatics context, the determination of the most significant set of descriptors is of great importance due to their contribution for improving ADMET prediction models. In this paper, a comprehensive analysis of descriptor selection aimed to physicochemical property prediction is presented. In addition, we propose an evolutionary approach where different fitness functions are compared. The comparison consists in establishing which method selects the subset of descriptors that best predicts a given property, as well as maintaining the cardinality of the subset to a minimum. The performance of the proposal was assessed for predicting hydrophobicity, using an ensemble of neural networks for the prediction task. The results showed that the evolutionary approach using a non linear fitness function constitutes a novel and a promising technique for this bioinformatic application. Keywords: Feature Selection, Genetic Algorithms, QSAR, hydrophobicity.
1 Motivation In the pharmaceutical industry, when a new medicine has to be developed, a ‘serial’ process starts where drug potency (activity) and selectivity are examined first [1]. Many of the candidate compounds fail at later stages due to ADMET (absorption, distribution, metabolism, excretion and toxicity) behavior in the body. ADMET properties are related to the way that a drug interacts with a large number of macromolecules and they correspond to the principal cause of failure in drug development [1]. In this way, a compound can be promising at first based on its molecular structure, but other factors such as aggregation, limited solubility or limited uptake in the human organism turn it useless as a drug. Nowadays, the failure rate of a potential drug before reaching the market is still high. The main problem is that most of the rules that govern ADMET behavior in the E. Marchiori and J.H. Moore (Eds.): EvoBIO 2008, LNCS 4973, pp. 188–199, 2008. © Springer-Verlag Berlin Heidelberg 2008
A Wrapper-Based Feature Selection Method for ADMET Prediction
189
human body are unknown. For these reasons, interest in Quantitative StructureActivity Relationships (QSAR) and Quantitative Structure-Property Relationships (QSPR) given by the scientific and industrial community has grown considerably in the last decades. Both of these approaches comprise the methods by which chemical structure parameters (known as descriptors) are quantitatively correlated with a well defined proces
Data Loading...