Improving utilization of heterogeneous clusters

  • PDF / 1,306,110 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 2 Downloads / 191 Views

DOWNLOAD

REPORT


Improving utilization of heterogeneous clusters Esteban Stafford1   · José Luis Bosque1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Datacenters often agglutinate sets of nodes with different capabilities, leading to a sub-optimal resource utilization. One of the best ways of improving utilization is to balance the load by taking into account the heterogeneity of these clusters. This article presents a novel way of expressing computational capacity, more adequate for heterogeneous clusters, and also advocates for task migration in order to further improve the utilization. The experimental evaluation shows that both proposals are advantageous and allow improving the utilization of heterogeneous clusters and reducing the makespan to 16.7% and 17.1%, respectively. Keywords  Heterogeneous clusters · Utilization · Load index · Task migration

1 Introduction The fast evolution of computer architecture together with the way datacenters acquire nodes, in time spaced renovation campaigns, is causing that at a given time, datacenters have several groups of nodes with different configurations and capabilities. The typical way in which administrators manage these heterogeneous clusters is to organize nodes with equal configurations in separate partitions, so that each partition is homogeneous. Then, the users are left to decide to which partition they will submit their jobs, a situation that can lead to inefficiencies. Since users tend to submit to the partitions with the newest nodes, these get overused while other queues with older or worse nodes are not exploited enough and found idle for significant periods of time [1]. Keeping nodes running regardless of their occupation causes a waste of energy. But even if the nodes are powered down when found idle, the cluster is not being utilized to its full potential. Moreover, it is known that power cycles affect the reliability of the nodes and increase maintenance costs [2, 3]. * Esteban Stafford [email protected] José Luis Bosque [email protected] 1



Department of Computer Science and Electronics, University of Cantabria, Santander, Spain

13

Vol.:(0123456789)



E. Stafford, J. L. Bosque

One of the most important challenges in successfully leveraging the performance of these large systems is to consistently distribute the load among the available resources, proportionally to their computing capacity [4–6]. This has been traditionally attempted through dynamic algorithms that are able to adapt to the varying requirements of the workloads. However, since the execution times of the latter are not constrained to short bursts, achieving a perfect balance often requires relocating tasks which are already in execution. Task migration is a costly operation, and therefore, it must only be undertaken when its benefit compensates its cost [7, 8]. Achieving a performance gain through task migration is easier in a heterogeneous cluster, since it is possible to find faster nodes to send tasks to [9]. Furthermore, the fact that executi