Compressed graph representation for scalable molecular graph generation
- PDF / 1,830,558 Bytes
- 8 Pages / 595.276 x 790.866 pts Page_size
- 33 Downloads / 264 Views
Journal of Cheminformatics Open Access
RESEARCH ARTICLE
Compressed graph representation for scalable molecular graph generation Youngchun Kwon1,2, Dongseon Lee1, Youn‑Suk Choi1*, Kyoham Shin3 and Seokho Kang3*
Abstract Recently, deep learning has been successfully applied to molecular graph generation. Nevertheless, mitigating the computational complexity, which increases with the number of nodes in a graph, has been a major challenge. This has hindered the application of deep learning-based molecular graph generation to large molecules with many heavy atoms. In this study, we present a molecular graph compression method to alleviate the complexity while maintaining the capability of generating chemically valid and diverse molecular graphs. We designate six small substructural patterns that are prevalent between two atoms in real-world molecules. These relevant substructures in a molecular graph are then converted to edges by regarding them as additional edge features along with the bond types. This reduces the number of nodes significantly without any information loss. Consequently, a generative model can be constructed in a more efficient and scalable manner with large molecules on a compressed graph representa‑ tion. We demonstrate the effectiveness of the proposed method for molecules with up to 88 heavy atoms using the GuacaMol benchmark. Keywords: Molecular graph generation, Compressed graph representation, Graph variational autoencoder, Deep learning Introduction Deep learning has revolutionized the design of novel molecules required for real-world industrial applications. Whereas traditional approaches have mostly been based on human knowledge and intuition, the use of deep learning has enabled the autonomous design of molecules by learning from previously accumulated data [1– 3]. Most existing methods use deep generative models, such as variational autoencoders (VAEs) and generative adversarial networks (GANs). Their capabilities depend on the way of representing a molecule. Such representations include simplified molecular-input line-entry system (SMILES) and molecular graph representation. *Correspondence: [email protected]; [email protected] 1 Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung‑ro, Yeongtong‑gu, Suwon, Republic of Korea 3 Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu‑ro, Jangan‑gu, Suwon, Republic of Korea Full list of author information is available at the end of the article
Although the SMILES representation has been demonstrated to be useful, recent research tends to employ the molecular graph representation, which is a natural and intuitive way of representing a molecule by regarding its atoms and bonds as nodes and edges, respectively [1]. A major challenge for molecular graph generation is addressing the scalability issue caused by its high computational complexity [4]. The representation of a molecular graph G = (V , E ) on which a model learns, where V and E are the set of nodes and edges in G , typically involve
Data Loading...