Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogene

Numerous challenges have to be mastered as applications in scientific computing are being developed for post-petascale parallel systems. While ample parallelism is usually available in the numerical problems at hand, the efficient use of supercomputer res

  • PDF / 888,947 Bytes
  • 22 Pages / 439.36 x 666.15 pts Page_size
  • 16 Downloads / 189 Views

DOWNLOAD

REPORT


Abstract Numerous challenges have to be mastered as applications in scientific computing are being developed for post-petascale parallel systems. While ample parallelism is usually available in the numerical problems at hand, the efficient use of supercomputer resources requires not only good scalability but also a verifiably effective use of resources on the core, the processor, and the accelerator level. Furthermore, power dissipation and energy consumption are becoming further optimization targets besides time-to-solution. Performance Engineering (PE) is the pivotal strategy for developing effective parallel code on all levels of modern architectures. In this paper we report on the development and use of low-level

M. Kreutzer () • F. Shahzad • G. Hager • G. Wellein Erlangen Regional Computing Center, Friedrich-Alxander-University Erlangen-Nuremberg, Erlangen, Germany e-mail: [email protected]; [email protected]; [email protected]; [email protected] A. Alvermann • A. Pieper • H. Fehske Institute of Physics, Ernst-Moritz-Arndt-Universität Greifswald, Greifswald, Germany e-mail: [email protected]; [email protected]; [email protected] M. Galgon • B. Lang Bergische Universität Wuppertal, Wuppertal, Germany e-mail: [email protected]; [email protected] J. Thies • M. Röhrig-Zöllner • A. Basermann German Aerospace Center (DLR), Simulation and Software Technology, Köln, Germany e-mail: [email protected]; [email protected]; [email protected] A.R. Bishop Theory, Simulation and Computation Directorate, Los Alamos National Laboratory, Los Alamos, NM, USA e-mail: [email protected] © Springer International Publishing Switzerland 2016 H.-J. Bungartz et al. (eds.), Software for Exascale Computing – SPPEXA 2013-2015, Lecture Notes in Computational Science and Engineering 113, DOI 10.1007/978-3-319-40528-5_14

317

318

M. Kreutzer et al.

parallel building blocks in the GHOST library (“General, Hybrid, and Optimized Sparse Toolkit”). We demonstrate the use of PE in optimizing a density of states computation using the Kernel Polynomial Method, and show that reduction of runtime and reduction of energy are literally the same goal in this case. We also give a brief overview of the capabilities of GHOST and the applications in which it is being used successfully.

1 Introduction The supercomputer architecture landscape has encountered dramatic changes in the past decade. Heterogeneous architectures hosting different compute devices (CPU, GPGPU, and Intel Xeon Phi) and systems running 105 cores or more are dominating the Top500 top ten [33] since the year 2013. Since then, however, turnover in the top ten has slowed down considerably. A new impetus is expected by the “Collaboration of Oak Ridge, Argonne, and Livermore” (CORAL)1 with multi-100 Pflop/s systems to be installed around 2018. These systems may feature high levels of thread parallelism and multiple compute devices at the node-level, and will exploit massive data parallelism through