Parallelization

In the following, we discuss the parallelization of the linked cell method from Chapter 3. We will use domain decomposition [568] as parallelization technique and MPI (message passing interface) [7] as a communication library. Parallelization is used to r

  • PDF / 751,111 Bytes
  • 38 Pages / 439.37 x 666.142 pts Page_size
  • 11 Downloads / 210 Views

DOWNLOAD

REPORT


In the following, we discuss the parallelization of the linked cell method from Chapter 3. We will use domain decomposition [568] as parallelization technique and MPI (message passing interface) [7] as a communication library. Parallelization is used to reduce the time needed to execute the necessary computations. This is done by distributing the computations to several processors, which can then execute these computations simultaneously, at least to some extent. In addition, parallelization also has the advantage that on a parallel computer there is often more memory available than on a single processor machine, and hence, larger problems can be tackled. In the last years, the development of modern computers has led to more and more powerful scalable parallel computer systems, see Figure 1.2. By now, such systems allow molecular dynamics simulations with many hundreds or thousands of millions of particles. The proper usage of parallel computers used to be an art since programming systems were very machine specific and programs developed on those machines were hard to test and difficult to port to other machines. Nowadays, however, there are (almost) fully developed programming environments that allow debugging of parallel codes and also ensure portability between different parallel computers. We start with an overview of parallel computers and different parallelization strategies. Then, in Section 4.2, we present domain decomposition as parallelization strategy for the linked cell method. In Section 4.3 we discuss in detail its implementation with MPI. Finally, in Section 4.5, we present some application examples for our parallelized algorithm. We extend examples from Section 3.6 from the two- to the three-dimensional case.

4.1 Parallel Computers and Parallelization Strategies Taxonomy of Parallel Computers. Since 1966 (Flynn [234]) parallel computer systems are categorized depending on whether the data stream and/or the instruction stream are processed in parallel. In this way, the fundamental types SISD (single instruction/single data stream – the classical microproces-

114

4 Parallelization

sor), SIMD (single instruction/multiple data stream) and MIMD (multiple instruction/multiple data stream) can be distinguished.1 Older parallel computers by MasPar and the Connection Machine Series by Thinking Machines or some current designs such as the “array processor experiment” (APE) fall for instance into the class of SIMD computers. On these computers, programs are executed on an array of very many, but simple processors. However, this particular architecture nowadays plays a minor role and is used only for some specific applications. Vector computers, such as Cray T90 and SV1/2, NEC SX-5 or Fujitsu VPP, also fall into the class of SIMD computers. In such computers, the same instructions are executed in a quasi-parallel manner using the assembly line principle. In a certain sense, RISC (reduced instruction set computer) microprocessors also belong to this class. A RISC processor usually only executes simple instruction