Convolutional architectures for virtual screening

  • PDF / 1,816,467 Bytes
  • 14 Pages / 595.276 x 793.701 pts Page_size
  • 54 Downloads / 187 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Convolutional architectures for virtual screening Isabella Mendolia1*, Salvatore Contino1, Ugo Perricone2*, Edoardo Ardizzone1 and Roberto Pirrone1 From Annual Meeting of the Bioinformatics Italian Society (BITS 2019) Palermo, Italy. 26-28 June 2019

* Correspondence: isabella. [email protected]; uperricone@ fondazionerimed.com 1 Dipartimento di Ingegneria Universit’a degli Studi di Palermo, Viale delle Scienze, Edificio 6, 90128 Palermo, Italy 2 Gruppo Drug Design, Fondazione Ri.MED, 90133 Palermo, Italy

Abstract Background: A Virtual Screening algorithm has to adapt to the different stages of this process. Early screening needs to ensure that all bioactive compounds are ranked in the first positions despite of the number of false positives, while a second screening round is aimed at increasing the prediction accuracy. Results: A novel CNN architecture is presented to this aim, which predicts bioactivity of candidate compounds on CDK1 using a combination of molecular fingerprints as their vector representation, and has been trained suitably to achieve good results as regards both enrichment factor and accuracy in different screening modes (98.55% accuracy in active-only selection, and 98.88% in high precision discrimination). Conclusion: The proposed architecture outperforms state-of-the-art ML approaches, and some interesting insights on molecular fingerprints are devised. Keywords: Deep learning, Drug design, Molecular fingerprints, Bioactivity prediction, Virtual screening

Background Virtual Screening (VS) is a routinely applied computational technique useful for drug design. However, some issues remain uncertain due to the complexity of the algorithms used behind the screening campaign, and this leads to generate models with different prediction reliability. Clinical candidate molecules selected by drug detection must have a profile responding to different criteria, that are based not only on the effect potency but also on the selectivity, safety as well as the so called ADMET properties (Absorption, Distribution, Metabolism, Excretion and Toxicity). Therefore, the design of the optimal compound is a multidimensional challenge involving different aspects of Chemistry and Biology, which can be faced using Machine Learning (ML). One key aspect for ML approaches gaining success in property prediction, is the possibility to access and mining large data sets that contain heterogeneous information. Until recent years, the best performing ML techniques were “shallow” ones [1] that is © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherw