Improving Clairvoyant: reduction algorithm resilient to imbalanced process arrival patterns

  • PDF / 887,435 Bytes
  • 33 Pages / 439.37 x 666.142 pts Page_size
  • 98 Downloads / 173 Views

DOWNLOAD

REPORT


Improving Clairvoyant: reduction algorithm resilient to imbalanced process arrival patterns Jerzy Proficz1

· Krzysztof M. Ocetkiewicz1

Accepted: 28 October 2020 © The Author(s) 2020

Abstract The Clairvoyant algorithm proposed in “A novel MPI reduction algorithm resilient to imbalances in process arrival times” was analyzed, commented and improved. The comments concern handling certain edge cases in the original pseudocode and description, i.e., adding another state of a process, improved cache friendliness more precise complexity estimations and some other issues improving the robustness of the algorithm implementation. The proposed improvements include skipping of idle loop rounds, simplifying generation of the ready set and management of the state array and an about 90-fold reduction in memory usage. Finally an extension enabling process arrival times (PATs) prediction was added: an additional background thread used to exchange the data with the PAT estimations. The performed tests, with a dedicated mini-benchmark executed in an HPC environment, showed correctness and improved performance of the solution, with comparison to the original or other state-of-the-art algorithms. Keywords Reduce · Clairvoyant · Process arrival pattern · MPI

1 Introduction Current trends in the high performance computing (HPC), stimulated by the rapid growth of the Artificial Intelligence (AI), Internet of Things (IoT), or Big Data analysis methods and tools, show massive development of compute cluster architectures, where most supercomputers consist of independent nodes connected by a fast interconnecting network, usually InfiniBand or Ethernet. In such environment, a natural approach

B

Jerzy Proficz [email protected] Krzysztof M. Ocetkiewicz [email protected]

1

Gdansk University of Technology Centre of Informatics-Tricity Academic Supercomputer and NetworK (CI TASK), 11/12 Gabriela Narutowicza Street, 80-233 Gda´nsk, Poland

123

J. Proficz, K. M. Ocetkiewicz

to provide data exchange and synchronization mechanisms is the message passing paradigm, with the usually used Message Passing Interface (MPI) standard [24], and its support for both point-to-point and collective operations. Thus, we can observe a rapid progress in parallelization of currently existing compute methods and applications, especially ones requiring the large compute resources. We perceive that the improvements of HPC communication algorithms and protocols, including our works, will enable faster optimization of the above crucial areas (IoT, AI, Big Data), especially in such topics like parallelization of hybrid parallel FTDT methods [22], intelligent home systems supported by neural networks [26], Big Data related programming models and systems [1], and voice evaluation mechanisms involving such complex mechanisms like bio-inspired algorithms or spiking neural network [14]. In the aforementioned cluster environment, every compute node, in fact a separate computer, has its own RAM memory, processor(s), I/O devices and possibly accelerators (espec