Measuring the performance of parallel computers with distributed memory
- PDF / 429,741 Bytes
- 11 Pages / 595.276 x 793.701 pts Page_size
- 47 Downloads / 227 Views
MEASURING THE PERFORMANCE OF PARALLEL COMPUTERS WITH DISTRIBUTED MEMORY
UDC 681.3
R. A. Iushchenko
Basic techniques for measuring the performance of parallel computers with distributed memory are considered. The results obtained via the de-facto standard LINPACK benchmark suite are shown to be weakly related to the efficiency of applied parallel programs. As a result, models and methods of macro-piping computations proposed by V. M. Glushkov in the late 1970s become topical again. These results are presented in the context of the modern architecture of cluster complexes. Keywords: parallel computations, performance, optimization, communication expenses, cluster, supercomputer, data processing, high-performance computing, HPC, MIMD. INTRODUCTION Supercomputers are used for solving problems in such statements, scopes, and details with which personal computers cannot manage in principle because of constraints on memory and performance. The computational speed of supercomputers has considerably increased lately and continues to rapidly increase. Though clock frequency of processors has reached its ultimate physical limit, the parallelization of algorithms has opened capabilities for the further increase in their performance. This tendency has equally touched on supercomputers with mass parallelism, cluster systems, and personal computers. Supercomputers were developed whose performance exceeds 1 PFLOPS. J. Dongarra, a recognized authority in the field of linear algebra and high-performance computations presented a number of interesting facts. On the average, a computer that leads the “Top-500 list” in terms of performance takes the last place in it after 6–8 years. The capacity of a supercomputer that takes the last place in this list, is compared with that of a personal computer after 8–10 years [1]. Since the power consumption of a processor is proportional to the cube of its clock frequency and a good parallelization provides a linear increase in computational speed, the use of a large number of processors with a lowered clock frequency makes it possible to considerably increase the performance with small power consumption. In particular, the company Intel developed an experimental model of a processor that has 80 nuclei, has peak performance equal to 1 TFLOPS, and consumes just 67 W. But the software world turned out to be unprepared to exploit parallelism since a vast number of sequential programs ready for use are accumulated during using customary computers. Formal methods of parallelization of sequential programs do not make it possible to obtain considerable acceleration in practice. The situation is also complicated by the frequent changes in well-established technologies of parallel programming for supercomputers and, hence, programs should be frequently rewritten. However, the computer world has already become parallel and any other ways of achieving maximal performance are absent. The objective of this article consists of considering issues connected with the performance of parallel computers, factors e
Data Loading...