DeepCOMO: from structure-activity relationship diagnostics to generative molecular design using the compound optimizatio

  • PDF / 2,149,366 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 57 Downloads / 135 Views

DOWNLOAD

REPORT


PERSPECTIVE

DeepCOMO: from structure-activity relationship diagnostics to generative molecular design using the compound optimization monitor methodology Dimitar Yonchev1 · Jürgen Bajorath1  Received: 31 July 2020 / Accepted: 29 September 2020 © The Author(s) 2020

Abstract The compound optimization monitor (COMO) approach was originally developed as a diagnostic approach to aid in evaluating development stages of analog series and progress made during lead optimization. COMO uses virtual analog populations for the assessment of chemical saturation of analog series and has been further developed to bridge between optimization diagnostics and compound design. Herein, we discuss key methodological features of COMO in its scientific context and present a deep learning extension of COMO for generative molecular design, leading to the introduction of DeepCOMO. Applications on exemplary analog series are reported to illustrate the entire DeepCOMO repertoire, ranging from chemical saturation and structure–activity relationship progression diagnostics to the evaluation of different analog design strategies and prioritization of virtual candidates for optimization efforts, taking into account the development stage of individual analog series. Keywords  Analog series · Lead optimization · Chemical saturation · SAR progression · Activity prediction · Generative deep learning

Introduction The intuition- and experience-driven process of hit-to-lead and lead optimization (LO) presents key challenges for medicinal chemistry. If successful, it ranges from the initial demonstration of sustainable structure–activity relationships (SARs) of selected active compounds and the iterative generation of many analogs to the final stages of confirming pre-clinical candidate status of optimized compound(s). To this date, the LO process is difficult, if not impossible to rationalize. Work on analog series (ASs) continues until multi-property optimization criteria are met or insurmountable roadblocks are hit. This typically is far from being a black-and-white scenario. Partly unclear SAR responses or rather subtle differences between desirable and undesirable compound properties often propagate through optimization * Jürgen Bajorath [email protected]‑bonn.de 1



Department of Life Science Informatics, B‑IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115 Bonn, Germany

efforts until they amplify and result in large-magnitude problems. At such stages, when much work has already been spent on the long road to candidate compounds, it is often difficult to call it a day and discontinue work on advanced series. As a matter of fact, answering the question when sufficient numbers of analogs might have been generated and further progress would be unlikely to expect is at least as critical in the practice of medicinal chemistry as making meaningful initial decisions which compounds or series to advance or not. In light of these caveats looming over optimization effort