pmTM-align: scalable pairwise and multiple structure alignment with Apache Spark and OpenMP

  • PDF / 1,971,902 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 87 Downloads / 181 Views

DOWNLOAD

REPORT


Open Access

SOFTWARE

pmTM‑align: scalable pairwise and multiple structure alignment with Apache Spark and OpenMP Weiya Chen1, Chun Yao1, Yingzhong Guo1, Yan Wang2 and Zhidong Xue1* 

*Correspondence: [email protected]; [email protected] 1 School of Software Engineering, Huazhong University of Science and Technology, Wuhan 430074, China Full list of author information is available at the end of the article

Abstract  Background:  Structure comparison can provide useful information to identify functional and evolutionary relationship between proteins. With the dramatic increase of protein structure data in the Protein Data Bank, computation time quickly becomes the bottleneck for large scale structure comparisons. To more efficiently deal with informative multiple structure alignment tasks, we propose pmTM-align, a parallel protein structure alignment approach based on mTM-align/TM-align. pmTM-align contains two stages to handle pairwise structure alignments with Spark and the phylogenetic tree-based multiple structure alignment task on a single computer with OpenMP. Results:  Experiments with the SABmark dataset showed that parallelization along with data structure optimization provided considerable speedup for mTM-align. The Sparkbased structure alignments achieved near ideal scalability with large datasets, and the OpenMP-based construction of the phylogenetic tree accelerated the incremental alignment of multiple structures and metrics computation by a factor of about 2–5. Conclusions:  pmTM-align enables scalable pairwise and multiple structure alignment computing and offers more timely responses for medium to large-sized input data than existing alignment tools such as mTM-align. Keywords:  Pairwise structure alignment, Multiple structure alignment, Apache Spark, OpenMP

Background The three-dimensional structure of protein plays an important role in providing inference of its molecular function, and is more conserved than sequence during evolution. Structure comparison can be used to identify functional and evolutionary relationship between proteins, which are very useful for functional annotation, structure-based drug design, protein–protein docking, and many other applications [1]. To compare two protein structures, we first need to find the best structural alignment between two proteins to initiate residue-level comparison. Many pairwise structure alignment (PSA) methods are developed in this aim, like DALI [2], CE [3], TM-align [4], etc. PSA can be further generalized to three or more structures, which © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in