Parallel programming models for heterogeneous many-cores: a comprehensive survey

  • PDF / 987,365 Bytes
  • 19 Pages / 595.276 x 790.866 pts Page_size
  • 27 Downloads / 223 Views

DOWNLOAD

REPORT


REGULAR PAPER

Parallel programming models for heterogeneous many‑cores: a comprehensive survey Jianbin Fang1 · Chun Huang1 · Tao Tang1 · Zheng Wang2 Received: 18 February 2020 / Accepted: 29 May 2020 © China Computer Federation (CCF) 2020

Abstract Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to match the underlying heterogeneous platform. In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability. We examine various software optimization techniques for minimizing the communicating overhead between heterogeneous computing devices. We provide a road map for a wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This article provides both an accessible introduction to the fast-moving area of heterogeneous programming and a detailed bibliography of its main achievements. Keywords  Heterogeneous computing · Many-core architectures · Parallel programming models

1 Introduction Heterogeneous many-core systems are now commonplace (Owens et al. 2005, 2008). The combination of using a host CPU together with specialized processing units (e.g., GPGPUs, XeonPhis, FPGAs, DSPs and NPUs) has been shown in many cases to achieve orders of magnitude performance improvement. As a recent example, Google’s Tensor Processing Units (TPUs) are application-specific integrated circuits (ASICs) to accelerate machine learning workloads (Patterson 2018). Typically, the host CPU of a heterogeneous platform manages the execution context while the * Chun Huang [email protected] Jianbin Fang [email protected] Tao Tang [email protected] Zheng Wang [email protected] 1



Institute for Computer Systems, College of Computer, National University of Defense Technology, Changsha, China



School of Computing, University of Leeds, Leeds, UK

2

computation is offloaded to the accelerator or coprocessor. Effectively leveraging such platforms not only enables the achievement of high performance, but increases energy efficiency. These goals are largely achieved using simple, yet customized hardware cores that use area more efficiently with less power dissipation (Chen et al. 2007). The increasing importance of heterogeneous many-core architectures can be seen from the TOP500 and Green500 list, where a large number of supercomputers are using both CPUs and accelerators (Green500 Supercomputers 2020; Top500 Supercomputers 2020). A closer look at the list of the TOP500 supercomputers shows that seven out of the top ten supercomputers are built upon heterogeneous manycore architectures (Table 1). On the other hand, this form of many-core architectu