clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters
- PDF / 1,201,382 Bytes
- 33 Pages / 439.37 x 666.142 pts Page_size
- 46 Downloads / 214 Views
clusterCL: comprehensive support for multi‑kernel data‑parallel applications in heterogeneous asymmetric clusters Valon Raca1 · Eduard Mehofer1
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Heterogeneous cluster systems consisting of CPUs and different kinds of accelerators have become mainstream in HPC. Programming such systems is a difficult task and requires addressing manifold challenges that stem from the intricate composition of such systems and peculiarities of scientific applications. A broad range of obstacles preventing efficient execution have to be considered and dealt with properly. In this paper, we propose a systematic approach and a framework that is capable of providing comprehensive support for running data-parallel applications in heterogeneous asymmetric clusters. Our implementation provides work partitioning and distribution by ensuring workload balance in the cluster while handling of partitioning-induced communication and synchronization in a transparent way. In our experimental section, we choose 11 representative scientific applications from different domains to evaluate our approach. Experimental results show a strong speedup and workload balance for different cluster configurations. Keywords Heterogeneous computing · Asymmetric clusters · Scientific applications
1 Introduction Heterogeneous computing with special-purpose devices and accelerators besides CPUs has become state of the art in the field of high performance computing. The key benefits of heterogeneous architectures compared to homogeneous systems—better performance and better energy efficiency—forced a paradigm shift with most of the current top 10 supercomputers of the world [37] using GPUs * Valon Raca [email protected] Eduard Mehofer [email protected] 1
Faculty of Computer Science, University of Vienna, Vienna, Austria
13
Vol.:(0123456789)
V. Raca, E. Mehofer
or other accelerators. Specialized processors with tremendous peak performance make it even for smaller institutions feasible to run computationally demanding jobs on cost-efficient clusters in-house instead of transferring their data to computing centers. Such mid-size clusters are often not assembled based on identical nodes but have an asymmetric hardware structure either from the very beginning or due to broken hardware. Whereas nowadays such systems are affordable even for smaller groups, programming support for heterogeneous, asymmetric clusters is getting more and more important. Programming heterogeneous devices has been enabled and simplified by programming models such as CUDA [26], OpenCL [19], OpenACC [28] and OpenMP ≥ 4.0 [29]. Except for CUDA, these programming models offer a uniform programming approach to various types of compute devices. Basically, a data-parallel kernel written for a device can be easily ported to any other compute device which supports the very same programming model. As long as single device units are targeted, these programming models offer adequate support. However, dist
Data Loading...