Interpretation of machine learning models using shapley values: application to compound potency and multi-target activit

PDF / 2,773,423 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
62 Downloads / 219 Views

Interpretation of machine learning models using shapley values: application to compound potency and multi‑target activity predictions Raquel Rodríguez‑Pérez1 · Jürgen Bajorath1 Received: 6 March 2020 / Accepted: 24 April 2020 © The Author(s) 2020

Abstract Difficulties in interpreting machine learning (ML) models and their predictions limit the practical applicability of and confidence in ML in pharmaceutical research. There is a need for agnostic approaches aiding in the interpretation of ML models regardless of their complexity that is also applicable to deep neural network (DNN) architectures and model ensembles. To these ends, the SHapley Additive exPlanations (SHAP) methodology has recently been introduced. The SHAP approach enables the identification and prioritization of features that determine compound classification and activity prediction using any ML model. Herein, we further extend the evaluation of the SHAP methodology by investigating a variant for exact calculation of Shapley values for decision tree methods and systematically compare this variant in compound activity and potency value predictions with the model-independent SHAP method. Moreover, new applications of the SHAP analysis approach are presented including interpretation of DNN models for the generation of multi-target activity profiles and ensemble regression models for potency prediction. Keywords Machine learning · Black box character · Structure–activity relationships · Compound activity · Compound potency prediction · Multi-target modeling · Model interpretation · Feature importance · Shapley values

Introduction Major tasks for machine learning (ML) in chemoinformatics and medicinal chemistry include predicting new bioactive small molecules or the potency of active compounds [1–4]. Typically, such predictions are carried out on the basis of molecular structure, more specifically, using computational descriptors calculated from molecular graph representations or conformations. For activity prediction, ML models are trained to systematically associate structural patterns, represented in more or less abstract forms, with known biological activities of small molecules. Classification models are derived for predicting class labels of test compounds (e.g., active/inactive or highly/weakly potent) whereas regression

* Jürgen Bajorath [email protected]‑bonn.de 1

Department of Life Science Informatics, B‑IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115 Bonn, Germany

models predict numerical potency values. Supervised ML can also be applied to predict other molecular properties. Understanding model decisions is generally relevant for assessing the consistency of predictions and detecting potential sources of model bias. Interpretability is also crucial for extracting knowledge from modeling efforts. Accordingly, there is high interest in better understanding the basis of correct ML predictions or failures [5–9]. For example, in structure–activity relat

Data Loading...

Interpretation of machine learning models using shapley values: application to compound potency and multi-target activit

Recommend Documents

Explaining Models by Propagating Shapley Values of Local Components

Humanistic interpretation and machine learning

Machine Learning and Image Interpretation

Machine Learning and Interpretation in Neuroimaging Internationa

Evaluating Machine Learning Models

Machine Learning Models

Interpretable Concept-Based Classification with Shapley Values

Deploying Machine Learning Models

Explaining Machine Learning Models of Emotion Using the BIRAFFE Dataset

Characterization of Iranian Grapevine Cultivars Using Machine Learning Models

Machine Learning and Interpretation in Neuroimaging 4th Internationa

Bilinear Models for Machine Learning