A novel parallel Markov clustering method in biological interaction network analysis under multi-GPU computing environme

  • PDF / 1,179,367 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 27 Downloads / 172 Views

DOWNLOAD

REPORT


A novel parallel Markov clustering method in biological interaction network analysis under multi‑GPU computing environment You Fu1 · Wei Zhou1,2 

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The various clustering methods are widely applied in analyzing biological interaction networks, such as the protein–protein interaction and the genetic interaction networks. With the rapid growth of these biological datasets in scale, much longer runtime is required to make cluster analyses on them. In this paper, we propose a novel parallel Markov clustering (MCL) method based on the ELLPACK-R sparse matrix format that can run on multiple graphic processing units (GPUs) equipped standalone computers. The method is implemented using the Compute Unified Device Architecture (CUDA) programming framework, and fine-grained warp-level optimization is introduced for improving the performance. The BioGRID, a largescale and freely accessible database of protein and genetic interactions, is adopted as the dataset in the experiment. The method has been assessed on a desktop computer equipped with two NVIDIA GTX 1070 GPUs. The results show that the proposed multi-GPU method can conduct MCL clustering on the full-size BioGRID database with about 6.5  min, that is much faster than the CPU serial MCL implementation which needs almost an hour and a half execution time. Keywords  Markov clustering · Biological network · PPI network · GPU computing · Compute Unified Device Architecture · CUDA

* Wei Zhou [email protected] 1

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

2

Network Information Center, Weifang Medical University, Weifang 261053, China



13

Vol.:(0123456789)



Y. Fu, W. Zhou

1 Introduction Biological network repositories deliver comprehensive protein and genetic interactions in a consistent and well-annotated format, while the information about functional affinities or structural similarities could be revealed through finding highly connected regions in these networks by various clustering methods. Recently, many research works in bioinformatics have studied the Markov clustering (MCL) method [1], and it has been demonstrated to be a more promising approach compared to other clustering algorithms  [2–4]. Although the MCL algorithm is more robust and reliable, the application of this algorithm in large-scale data sets is still a challenging task due to high computational and space complexities. In high-performance computing area, GPUs have already become the dominant accelerators in various parallel systems owing to the many-core, high memory bandwidth and cost-effective advantages  [5]. The GPUs also extend the domain of high-performance computing to personal computers which once only belongs to supercomputers. As an off-the-shelf computer component, GPUs are easily acquired compared to other specialized computation accelerators; therefore, personal computers could be transformed to desktop parallel computing systems without mu