Spark-based parallel calculation of 3D fourier shell correlation for macromolecule structure local resolution estimation
- PDF / 3,649,781 Bytes
- 18 Pages / 595 x 794 pts Page_size
- 102 Downloads / 143 Views
METHODOLOGY
Open Access
Spark-based parallel calculation of 3D fourier shell correlation for macromolecule structure local resolution estimation Yongchun Lü1,2* , Xiangrui Zeng3 , Xinhui Tian1 , Xiao Shi1 , Hui Wang1 , Xiaohui Zheng1,2 , Xiaodong Liu1 , Xiaofang Zhao1,2 , Xin Gao4 and Min Xu3* From The 18th Asia Pacific Bioinformatics Conference Seoul, Korea. 18–20 August 2020 *Correspondence: [email protected]; [email protected] 1 Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China 3 Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA Full list of author information is available at the end of the article
Abstract Background: Resolution estimation is the main evaluation criteria for the reconstruction of macromolecular 3D structure in the field of cryoelectron microscopy (cryo-EM). At present, there are many methods to evaluate the 3D resolution for reconstructed macromolecular structures from Single Particle Analysis (SPA) in cryo-EM and subtomogram averaging (SA) in electron cryotomography (cryo-ET). As global methods, they measure the resolution of the structure as a whole, but they are inaccurate in detecting subtle local changes of reconstruction. In order to detect the subtle changes of reconstruction of SPA and SA, a few local resolution methods are proposed. The mainstream local resolution evaluation methods are based on local Fourier shell correlation (FSC), which is computationally intensive. However, the existing resolution evaluation methods are based on multi-threading implementation on a single computer with very poor scalability. Results: This paper proposes a new fine-grained 3D array partition method by key-value format in Spark. Our method first converts 3D images to key-value data (K-V). Then the K-V data is used for 3D array partitioning and data exchange in parallel. So Spark-based distributed parallel computing framework can solve the above scalability problem. In this distributed computing framework, all 3D local FSC tasks are simultaneously calculated across multiple nodes in a computer cluster. Through the calculation of experimental data, 3D local resolution evaluation algorithm based on Spark fine-grained 3D array partition has a magnitude change in computing speed compared with the mainstream FSC algorithm under the condition that the accuracy remains unchanged, and has better fault tolerance and scalability. (Continued on next page)
© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not
Data Loading...