Why do Users Need to Take Care of Their HPC Applications Efficiency?

  • PDF / 894,581 Bytes
  • 12 Pages / 612 x 792 pts (letter) Page_size
  • 42 Downloads / 193 Views

DOWNLOAD

REPORT


Why do Users Need to Take Care of Their HPC Applications Efficiency? D. A. Nikitenko1* , P. A. Shvets1, 2** , and V. V. Voevodin1, 2*** (Submitted by E. E. Tyrtyshnikov) 1

2

Lomonosov Moscow State University, Moscow, 119991 Russia Moscow Center of Fundamental and Applied Mathematics, Moscow, 119991 Russia Received March 31, 2020; revised April 16, 2020; accepted April 20, 2020

Abstract—High-performance computing takes a very important place in modern scientific research process. And since all scientists want to solve their problems faster, it is very important to speed up these computations. For these purposes, new algorithms are being developed, new HPC systems appear, etc. However, quite little attention is paid to the efficiency of high-performance computations, which often leads to a vast amount of supercomputer resources being idle. It is vital to change this situation; in particular, it is necessary to show users the importance and necessity of optimizing their applications. One of the main steps in this direction is to help users detect performance issues in their programs, analyze their level of criticality as well as root causes, and eliminate them in order to improve application performance. In this article we describe the research being performed at the Lomonosov Moscow State University aimed at solving this problem. In particular, we analyze the results of supercomputer center users survey, showing their opinion on the efficiency analysis. We also share our vision on the HPC center workflow requirements to support system and applications efficiency analysis. After that, we describe a software tool being developed that allows any supercomputer user to obtain and analyze versatile statistics on performance of his HPC jobs, helping him to detect possible root causes of performance degradation. DOI: 10.1134/S1995080220080132 Keywords and phrases: high-performance computing, supercomputer, application efficiency, performance analysis, performance statistics, system software, parallel program.

1. INTRODUCTION The question of HPC systems and applications efficiency has been studied for a very long time, for at least 50 years (see, for example, [1]). During this time, many thoughtful and detailed research projects and studies have been carried out, eventually leading to the development of a vast number of versatile and complex software tools intended to analyze the efficiency of parallel applications. Nowadays there is a whole software “zoo” of debuggers, profiles, trace analysis tools, simulators helping to conduct program performance analysis from different sides. These tools are still of great need, but the situation has changed, and usage of only these tools has become insufficient. The ever-increasing complexity and heterogeneity of supercomputer systems has made it very difficult to develop a highly efficient parallel program that will fully utilize available computing resources. On the other hand, the HPC area is growing fast [2], and there are more and more specialists from different scientific areas (astronomy, oil & gas, che