A Fault-Tolerant Workflow Scheduling Algorithm for Grid with Near-Optimal Redundancy

  • PDF / 1,064,263 Bytes
  • 18 Pages / 547.087 x 737.008 pts Page_size
  • 34 Downloads / 157 Views

DOWNLOAD

REPORT


A Fault-Tolerant Workflow Scheduling Algorithm for Grid with Near-Optimal Redundancy Alemeh Matani & Hamid Reza Naji

& Hassan Motallebi

Received: 26 April 2018 / Accepted: 24 May 2020 # Springer Nature B.V. 2020

Abstract In scheduling workflows in grid environment, concerns such as minimizing the makespan and cost, meeting the time and budget constraints and the possibility of resource failures and so on have motivated researchers to propose numerous scheduling algorithms. Several heuristics and meta-heuristic algorithms have been proposed to address these issues, each of which often only considers one or a few of these criteria. However, less attention has been paid to fault-tolerant scheduling of workflows. Adding fault-tolerance to a workflow scheduling algorithm leads to an inevitable increase in the makespan and cost. Using the resubmission technique may result to an unacceptable increase in the execution time and possible violation of deadline while the replication method increases the execution cost. In this paper, we propose a fault-tolerant workflow scheduling algorithm with near-optimal time and cost overhead. The proposed approach brings a two-fold novelty. First, we assume a stochastic model of workflow with nondeterministic task parameters and use interval arithmetic to model task execution times and propose a new scheduling algorithm in which the A. Matani : H. R. Naji (*) : H. Motallebi Department of Information Technology, Graduate University of Advanced Technology, Kerman, Iran e-mail: [email protected]

A. Matani e-mail: [email protected] H. Motallebi e-mail: [email protected]

task assignment decisions are taken according to the performability fluctuations of the computational resources. Second, we employ an Efficient combination of resubmission and replication techniques to achieve the benefits of both and propose an algorithm for reliable scheduling of scientific workflows with nearoptimal additional time and cost. The proposed method, achieves a significant increase in the reliability while the additional execution time and cost is almost negligible. Keywords Workflow scheduling . Reliability . Redundancy . Replication . Resubmission

1 Introduction Many of the large-scale scientific grid applications can be naturally represented as workflows. A workflow is usually modelled as a directed acyclic graph (DAG) with nodes representing workflow tasks and edges representing the precedence constraints among them. Workflow scheduling is the problem of finding a mapping of workflow tasks into the available computational resources, so that the task precedence constraints are respected and probably some other requirements are met. Numerous workflow scheduling strategies have been proposed in literature with different aims and objective functions [1–4]. Some of these algorithms are designed with the objective of reducing the makespan [5–7] or cost [8] while some others are aimed to meet the specified budget [9, 10] and time [11, 12] constraints. While some works consider a fault-free env