Dynamic Load Balancing in MPI Jobs

There are at least three dimensions of overhead to be considered by any parallel job scheduling algorithm: load balancing, synchronization, and communication overhead. In this work we first study several heuristics to choose the next to run from a global

  • PDF / 320,343 Bytes
  • 13 Pages / 430 x 660 pts Page_size
  • 19 Downloads / 225 Views

DOWNLOAD

REPORT


stract. There are at least three dimensions of overhead to be considered by any parallel job scheduling algorithm: load balancing, synchronization, and communication overhead. In this work we first study several heuristics to choose the next to run from a global processes queue. After that we present a mechanism to decide at runtime weather to apply Local process queue per processor or Global processes queue per job, depending on the load balancing degree of the job, without any previous knowledge of it.

1

Introduction

Scheduling schemes for multiprogrammed parallel systems can be viewed in two levels. In the first level processors are allocated to a job, in the second, processes from the job are scheduled using this pool of processors. When having more processes than processors allocated to a job, processors must be shared among them. We will work on message-passing parallel applications, using the MPI [16] library, running on shared-memory multiprocessors (SMMs). This library is worldwide used, even on SMMs, due to its performance portability on other platforms, compared to other programming models such as threads and OpenMP and because it may also fully exploit the underlying SMM architecture without careful consideration of data placement and synchronization. So the question arises whether to map processes to processors and then use a local queue on each one, or to have all the processors share a single global queue. Without shared memory a global queue would be difficult to implement efficiently [8]. When scheduling jobs there are three types of overhead that must be taken into account: load balance, to keep the resources busy, most of the time; synchronization overhead, when scheduling message-passing parallel jobs; and communication overhead, when migrating processes among processors losing locality. Previous work [24] has studied the synchronization overhead generated when scheduling processes from a job need to synchronize each other frequently. It proposes a scheme based on the combination of the best benefits from Static Space Sharing and Co-Scheduling, the Self-Coscheduling. The communication overhead is also studied in [25] when applying malleability to MPI jobs. In this work we present a mechanism, the Load Balancing Detector (LDB), to classify J. Labarta, K. Joe, and T. Sato (Eds.): ISHPC 2005 and ALPS 2006, LNCS 4759, pp. 117–129, 2008. c Springer-Verlag Berlin Heidelberg 2008 

118

G. Utrera, J. Corbal´ an, and J. Labarta

applications depending on their balance degree, to decide at runtime the appropriate process queue type to apply to each job, without any previous knowledge of it. We evaluate first several heuristics to decide the next process to run from the global queue depending on the number of unconsumed messages, the process timestamp, the sender process of the recently blocked or the process executed before the recently blocked one if it is ready. We obtained the best performance when selecting the sender process of the recently blocked process or if is not possible the one with the greater n