Integrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters

PDF / 1,249,686 Bytes
20 Pages / 439.37 x 666.142 pts Page_size
35 Downloads / 244 Views

Integrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters Jesús Cámara1 · Javier Cuenca1 · Domingo Giménez2

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract A hierarchical approach for autotuning linear algebra routines on heterogeneous platforms is presented. Hierarchy helps to alleviate the difficulties of tuning parallel routines for high-performance computing systems. This paper analyzes the application of the hierarchical approach at both the hardware and software levels, using the basic matrix multiplication and the Strassen multiplication as proof of concept on multicore+coprocessor nodes. In this way, the hierarchical approach allows partial delegation of the efficient exploitation of the computing units in the node to the underlying direct autotuned matrix multiplication used in the base case. Keywords Autotuning · Hybrid programming · Heterogeneous computing · Multicore · Manycore

1 Introduction Today, standard computational nodes include one multicore CPU together with one or more coprocessors (typically GPUs and/or Many Integrated Core, e.g., the Intel Xeon Phi). The basic computational components of these nodes have different architectures and computational capacities; therefore, they can be organized/managed hierarchically, with the basic computing units (CPU, GPU and MIC) having separate memory spaces and communicating with data transfers between them across * Javier Cuenca [email protected] Jesús Cámara [email protected] Domingo Giménez [email protected] 1

Department of Engineering and Technology of Computers, University of Murcia, Murcia, Spain

2

Department of Computing and Systems, University of Murcia, Murcia, Spain

13

Vol.:(0123456789)

J. Cámara et al.

the memory associated with the CPUs and those of the coprocessors. This heterogeneous and hierarchical organization makes the efficient exploitation of routines for those nodes difficult and requires techniques for exploiting the underlying heterogeneity and hierarchy. Elsewhere, linear algebra routines are widely used as basic computational kernels in scientific software, and their optimization for today’s standard heterogeneous nodes would lead to important improvements when solving scientific problems based on highly efficient linear algebra libraries such as MKL [16], PLASMA [19], MAGMA [1] and Chameleon [8], whose routines base their optimization in implementations by blocks or tiles in which the basic kernel is a highly optimized matrix multiplication [13]. The matrix multiplication has been widely researched, and there are now many highly efficient implementations for today’s systems [14, 15, 17]. As with computational systems, the optimization of linear algebra routines has traditionally been based on a hierarchical schema [6], with a set of basic linear algebra routines (BLAS) and higher-level routines (LAPACK) developed by blocks or tiles. A hierarchical and decentralized schema can be applied for the automatic optimization of linear algebra sof

Data Loading...

Integrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters

Recommend Documents

Software Autotuning

Hardware/Software Co-design for Heterogeneous Multi-core Platforms T

clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters

Integrating Research and Practice in Software Engineering

Integrating security and privacy in software development

Hardware-Software-Codesign

LUX: An Heterogeneous Function Composition Parallel Computer for Graphics

Integrating heterogeneous thesauruses for Chinese synonyms

Improving utilization of heterogeneous clusters

System Level Hardware/Software Co-design An Industrial Approach

Model-Integrating Software Components Engineering Flexible Software

Performance and Interaction Routines in Multinational Corporation