GPUs-RRTMG_LW: high-efficient and scalable computing for a longwave radiative transfer model on multiple GPUs
- PDF / 2,204,786 Bytes
- 20 Pages / 439.37 x 666.142 pts Page_size
- 102 Downloads / 226 Views
GPUs‑RRTMG_LW: high‑efficient and scalable computing for a longwave radiative transfer model on multiple GPUs Yuzhu Wang1 · Mingxin Guo1 · Yuan Zhao1 · Jinrong Jiang2 Accepted: 11 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Atmospheric radiation physical process plays an important role in climate simulations. As a radiative transfer scheme, the rapid radiative transfer model for general circulation models (RRTMG) is widely used in weather forecasting and climate simulation systems. However, its expensive computational overhead poses a severe challenge to system performance. Therefore, improving the radiative transfer model’s computational performance has significant scientific research and practical value. Numerous radiative transfer models have benefited from a widely used and powerful GPU. Nevertheless, few of them have exploited CPU/GPU cluster resources within heterogeneous high-performance computing platforms. In this paper, we endeavor to demonstrate an approach that runs a large-scale, computationally intensive, longwave radiative transfer model on a GPU cluster. First, a CUDAbased acceleration algorithm of the RRTMG longwave radiation scheme (RRTMG_ LW) on multiple GPUs is proposed. Then, a heterogeneous, hybrid programming paradigm (MPI+CUDA) is presented and utilized with the RRTMG_LW on a GPU cluster. After implementing the algorithm in CUDA Fortran, a multi-GPU version of the RRTMG_LW, namely GPUs-RRTMG_LW, was developed. The experimental results demonstrate that the multi-GPU acceleration algorithm is valid, scalable, and highly efficient when compared to a single GPU or CPU. Running the GPUsRRTMG_LW on a K20 cluster achieved a 77.78× speedup when compared to a single Intel Xeon E5-2680 CPU core. Keywords High-performance computing · Graphics processing unit · Compute Unified Device Architecture · Radiative transfer
* Yuzhu Wang [email protected] Extended author information available on the last page of the article
13
Vol.:(0123456789)
Y. Wang et al.
1 Introduction Due to the massive number of calculations involved, climate models or earth system models need support from high-performance computing (HPC) [1, 2]. Radiative transfer models, which are employed to calculate atmospheric radiative fluxes and heating rates [3], also demand the HPC. Some of the most well-known radiative transfer models are the line-by-line radiative transfer model (LBLRTM) [4, 5], rapid radiative transfer model (RRTM) [6], and rapid radiative transfer model for general circulation models (RRTMG). As an accelerated version of RRTM, the RRTMG can perform computations more efficiently [7, 8]. However, it still demands enormous computing resources for long-term climatic simulation [9–11]. The Chinese Academy of Sciences-Earth System Model (CAS-ESM) [12–14] uses the Institute of Atmospheric Physics (IAP) of CAS Atmospheric General Circulation Model Version 4.0 (IAP AGCM4.0) [15, 16] as its atmospheric component model. Here, the IAP AGCM4.0 uses the RRTMG as its radiative pa
Data Loading...