Model-Based Failure Management for Distributed Reactive Systems

Failure management is key to the development of safety-critical, distributed, reactive systems common in such applications as avionics, automotive, and sensor/actuator networks. Specific challenges to effective failure management include (i) developing an

  • PDF / 848,538 Bytes
  • 22 Pages / 430 x 660 pts Page_size
  • 92 Downloads / 201 Views

DOWNLOAD

REPORT


Abstract. Failure management is key to the development of safetycritical, distributed, reactive systems common in such applications as avionics, automotive, and sensor/actuator networks. Specific challenges to effective failure management include (i) developing an understanding of the application domain so as to define what constitutes a failure; (ii) disentangling failure management concepts at design and runtime; and (iii) detecting and mitigating failures at the level of systems-of-systems integration. In this paper, we address (i) and (ii) by developing a failure ontology for logical and deployment architectures, respectively, including a mapping between the two. This ontology is based on the interaction patterns (or services) defining the component interplay in a distributed system. We address (iii) by defining detectors and mitigators at the service/interaction level – we discuss how to derive detectors for a significant subset of the failure ontology directly from the interaction patterns. We demonstrate the utility of our techniques using a large scale oceanographic sensor/actuator network. Keywords: Failure Management, Distributed Systems, Ontology, Reactive Systems.

1

Introduction

Failures can cause serious harm in many application domains. In domains such as avionics, automotive, and plant control, lives often depend on the correct functionality of software systems. One of the most challenging tasks of system developers is to ensure that the system both delivers the expected functionalities and is resilient to failures. We advocate the use of a combination of elements from Model Driven Architecture (MDA) [1] and Service-Oriented Architectures (SOA) to disentangle functional aspects of system behavior from the treatment of failures. The basic building block of our approach is the service. Services capture interaction patterns between system entities. Our approach leverages the interaction descriptions captured by services to identify, at run time, deviations from the specified behavior. An ontology guides the identification of failures and the activation of additional services that mitigate the effects of such failures. F. Kordon and O. Sokolsky (Eds.): Monterey Workshop 2006, LNCS 4888, pp. 53–74, 2007. c Springer-Verlag Berlin Heidelberg 2007 

54

V. Ermagan, I. Kr¨ uger, and M. Menarini

Fig. 1. Model-based failure-management approach

Figure 1 outlines the model-based failure management approach we propose. The figure shows the two main elements of our approach. We leverage an ontology, encompassing a failure taxonomy, service oriented models, deployment models and the mapping between them, to inform an MDA approach. We enrich the logical and deployment models typical of any MDA with a failure hypothesis. This additional artifact, based on the failure ontology, captures what physical and logical entities can fail in a system. It also provides a formal basis to reason about system correctness in presence of failures. Our SOA models are based on hierarchically composed interaction models, extending the servi