First Experiences with Intel Cluster OpenMP

MPI and OpenMP are the de-facto standards for distributed-memory and shared-memory parallelization, respectively. By employing a hybrid approach, that is combing OpenMP and MPI parallelization in one program, a cluster of SMP systems can be exploited. Nev

  • PDF / 307,858 Bytes
  • 12 Pages / 430 x 660 pts Page_size
  • 56 Downloads / 218 Views

DOWNLOAD

REPORT


Abstract. MPI and OpenMP are the de-facto standards for distributedmemory and shared-memory parallelization, respectively. By employing a hybrid approach, that is combing OpenMP and MPI parallelization in one program, a cluster of SMP systems can be exploited. Nevertheless, mixing programming paradigms and writing explicit message passing code might increase the parallel program development time significantly. Intel Cluster OpenMP is the first commercially available OpenMP implementation for a cluster, aiming to combine the ease of use of the OpenMP parallelization paradigm with the cost efficiency of a commodity cluster. In this paper we present our first experiences with Intel Cluster OpenMP.

1

Introduction

The main advantage of shared-memory parallelization with OpenMP over MPI is that data can be accessed by all instruction streams without reasoning whether it must be transferred beforehand. This allows for an incremental parallelization approach and leads to shorter parallel program development time. Complicated dynamic data structures and irregular and possibly changing data access patterns make programming in MPI more difficult, whereas the level of complexity introduced by shared-memory parallelization is lower in many cases. As OpenMP is a directive-based language, the original serial program can stay intact, which is an advantage over other shared-memory parallelization paradigms. The downside of any shared-memory paradigm is that the resulting parallel program is restricted to execute in a single address space. Bus-based multiprocessor machines typically do not scale well beyond four processors for memory-intense applications. Larger SMP and ccNUMA systems require scalable and thus expensive interconnects. Because of that, several attempts to bring OpenMP to clusters have been made in the past. In [6] an OpenMP implementation for the TreadMarks software has been presented, which supports only a subset of the OpenMP standard. In [7] an OpenMP implementation on top of the page-based distributed shared-memory (DSM) system SCASH has been presented for the Omni source-to-source translator. In this approach, all accesses to global variables are replaced by accesses into the DSM and all shared data is controlled by the DSM. Although the full OpenMP specification is implemented, support for the C++ programming language is missing. R. Eigenmann and B.R. de Supinski (Eds.): IWOMP 2008, LNCS 5004, pp. 48–59, 2008. c Springer-Verlag Berlin Heidelberg 2008 

First Experiences with Intel Cluster OpenMP

49

In 2006, Intel made the first commercial implementation of OpenMP for clusters available, named Intel Cluster OpenMP [4] (referred to as ClOMP in this paper). The full OpenMP 2.5 standard for Fortran, C and C++ is implemented, although nested parallel regions are not yet supported. This paper is organized as follows: In section 2 we give an overview of OpenMP and Intel Cluster OpenMP. In section 3 we present micro-benchmark measurements of OpenMP and ClOMP constructs and discuss which types of applications we expect to p