From Big Data to Artificial Intelligence: chemoinformatics meets new challenges

  • PDF / 833,678 Bytes
  • 3 Pages / 595.276 x 790.866 pts Page_size
  • 84 Downloads / 195 Views

DOWNLOAD

REPORT


ournal of Cheminformatics Open Access

EDITORIAL

From Big Data to Artificial Intelligence: chemoinformatics meets new challenges Igor V. Tetko1,2* and Ola Engkvist3

Abstract  The increasing volume of biomedical data in chemistry and life sciences requires development of new methods and approaches for their analysis. Artificial Intelligence and machine learning, especially neural networks, are increas‑ ingly used in the chemical industry, in particular with respect to Big Data. This editorial highlights the main results presented during the special session of the International Conference on Neural Networks organized by “Big Data in Chemistry” project and draws perspectives on the future progress of the field. The analysis and exploitation of Big Data was the cornerstone of the “Big Data in Chemistry” (BIGCHEM), and of this special issue, which was prepared following the International Conference on Neural Networks (ICANN2019). In total 17 articles, including 15 contributions co-authored by BIGCHEM PhD students and partners, were published in this issue. Its thematic covered many different aspects of the use of Big Data in medicinal chemistry [1, 2] that were actively pursued and advanced during the project. The articles in the issue can be categorized into two main groups. The first group deals with machine learning methods to improve analysis of large datasets such as those of highthroughput screening (HTS) campaigns. The comparison of structure-based and protein–ligand interaction fingerprints (IFPs) and for the prediction of ligand binding modes for protein kinases were studied by RodríguezPérez et  al. [3]. The authors showed that including target-relevant information via IPFs improved predictions of the modes by about 10% compared to the use of traditional atom environment fingerprints. Laufkötter et al. [4] demonstrated that augmenting chemical structure descriptors with bio-activity based fingerprints derived *Correspondence: [email protected] 1 Helmholtz Zentrum München‑German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany Full list of author information is available at the end of the article

from HTS data provides better performance but, importantly, also superior scaffold hopping capability. Analogously QSAR-derived affinity fingerprints (QAFFP) [5, 6] outperformed classical Morgan fingerprints for scaffold hopping. While Morgan fingerprints due to their robustness and performance for small molecules (see review of David et  al. [7]) are frequently used as a gold standard in, e.g., virtual screening and target predictions, they might not be optimal for larger molecules, such as peptides. MinHashed Atom-Pair fingerprints with a diameter of up to four bonds (MAP4) [8] were introduced as a universal fingerprint providing good results for various targets. HTS data are frequently imbalanced with only few active compounds: COVER (conformational oversampling as data augmentation for molecules) generates multiple conformations