Distributed machine learning load balancing strategy in cloud computing services

  • PDF / 2,012,246 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 48 Downloads / 252 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

Distributed machine learning load balancing strategy in cloud computing services Mingwei Li1,2 • Jilin Zhang1,2,3 • Jian Wan1,2,4 • Yongjian Ren1,2 • Li Zhou1,2 • Baofu Wu1,2 Rui Yang1,2 • Jue Wang5



Ó Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract Mobile service computing is a new cloud computing model that provides various cloud services for mobile intelligent terminal users through mobile internet access. The quality of service is an essential problem faced by mobile service computing. In this paper, we demonstrate a series of research studies on how to accelerate the training of a distributed machine learning (ML) model based on cloud service. Distributed ML has become the mainstream way of today’s ML models training. In traditional distributed ML based on bulk synchronous parallel, the temporary slowdown of any node in the cluster will delay the calculation of other nodes because of the frequent occurrence of synchronous barriers, resulting in overall performance degradation. Our paper proposes a load balancing strategy named adaptive fast reassignment (AdaptFR). Based on this, we built a distributed parallel computing model called adaptive-dynamic synchronous parallel (A-DSP). A-DSP uses a more relaxed synchronization model to reduce the performance consumption caused by synchronous operations while ensuring the consistency of the model. At the same time, A-DSP also implements the AdaptFR load balancing strategy, which addresses the straggler problem caused by the performance difference between nodes under the premise of ensuring the accuracy of the model. The experiments show that A-DSP can effectively improve the training speed while ensuring the accuracy of the model in the distributed ML model training. Keywords Mobile service computing  Cloud service  Distributed machine learning  Load balancing  Adaptive fast reassignment

& Jian Wan [email protected] Mingwei Li [email protected]

1

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China

2

Key Laboratory of Complex Systems Modeling and Simulation, Ministry of Education, Hangzhou 310018, China

3

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

4

School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, Zhejiang, China

5

Supercomputing Center of Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China

Jilin Zhang [email protected] Yongjian Ren [email protected] Li Zhou [email protected] Baofu Wu [email protected] Rui Yang [email protected] Jue Wang [email protected]

123

Wireless Networks

1 Introduction Mobile service computing is an essential supporting technology for mobile internet and cloud computing [1], and its emergence adds dynamic and intelligent capabilities to distributed computing [2, 3]. In the mobile cloud computing mode, the mas