Optimal low-latency network topologies for cluster performance enhancement
- PDF / 2,417,224 Bytes
- 27 Pages / 439.37 x 666.142 pts Page_size
- 28 Downloads / 193 Views
Optimal low‑latency network topologies for cluster performance enhancement Yuefan Deng1 · Meng Guo2 · Alexandre F. Ramos3,4 · Xiaolong Huang1 · Zhipeng Xu1,5 · Weifeng Liu6
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract We propose that clusters interconnected with network topologies having minimal mean path length will increase their processing speeds. We approach our heuristic by constructing clusters of up to 32 nodes having torus, ring, Chvatal, Wagner, Bidiakis and optimal topology for minimal mean path length and by simulating the performance of 256 nodes clusters with the same network topologies. The optimal (or near-optimal) low-latency network topologies are found by minimizing the mean path length of regular graphs. The selected topologies are benchmarked using ping-pong messaging, the MPI collective communications and the standard parallel applications including effective bandwidth, FFTE, Graph 500 and NAS parallel benchmarks. We established strong correlations between the clusters’ performances and the network topologies, especially the mean path lengths, for a wide range of applications. In communication-intensive benchmarks, optimal graphs enabled network topologies with multifold performance enhancement in comparison with mainstream graphs. It is striking that mere adjustment of the network topology suffices to reclaim performance from the same computing hardware. Keywords Network topology · Graph theory · Latency · Benchmarks
1 Introduction The ever increasing processing speeds of supercomputers—culminating at IBM Summit [6] with its peak speed of 201 PFlops and 2,414,592 cores—brings exascale era within reach by systems and applications developers. For achieving the milestone of exascale computing, the developers must reduce power consumption and Electronic supplementary material The online version of this article (https://doi.org/10.1007/s1122 7-020-03216-y) contains supplementary material, which is available to authorized users. * Alexandre F. Ramos [email protected] Extended author information available on the last page of the article
13
Vol.:(0123456789)
Y. Deng et al.
increase processing speeds by means of, e.g., design of power-efficient processors (and other components) capable of delivering higher local performance and design of networks capable of delivering low-latency and high-bandwidth communications. Those goals have been incrementally achieved, e.g., the ratio of performance to power consumption of IBM Summit is greater than that of TaihuLight; IBM Summit’s faster processing speed is reached with a smaller number of cores; comparison of June 2018 and November 2018 Top 500 lists [6] shows Sierra machine surpassing TaihuLight with a new High-Performance Linpack (HPL) result. Performance increase, however, cannot rely only on raising individual processors clock speed because of the power wall of the Moore’s law [60]. Consequently, the number of interconnected processors will keep increasing along with the impact of network topologi
Data Loading...