Model-Based Failure Management for Distributed Reactive Systems

Failure management is key to the development of safety-critical, distributed, reactive systems common in such applications as avionics, automotive, and sensor/actuator networks. Specific challenges to effective failure management include (i) developing an

PDF / 848,538 Bytes
22 Pages / 430 x 660 pts Page_size
92 Downloads / 214 Views

DOWNLOAD

REPORT

Abstract. Failure management is key to the development of safetycritical, distributed, reactive systems common in such applications as avionics, automotive, and sensor/actuator networks. Speciﬁc challenges to eﬀective failure management include (i) developing an understanding of the application domain so as to deﬁne what constitutes a failure; (ii) disentangling failure management concepts at design and runtime; and (iii) detecting and mitigating failures at the level of systems-of-systems integration. In this paper, we address (i) and (ii) by developing a failure ontology for logical and deployment architectures, respectively, including a mapping between the two. This ontology is based on the interaction patterns (or services) deﬁning the component interplay in a distributed system. We address (iii) by deﬁning detectors and mitigators at the service/interaction level – we discuss how to derive detectors for a signiﬁcant subset of the failure ontology directly from the interaction patterns. We demonstrate the utility of our techniques using a large scale oceanographic sensor/actuator network. Keywords: Failure Management, Distributed Systems, Ontology, Reactive Systems.

1

Introduction

Failures can cause serious harm in many application domains. In domains such as avionics, automotive, and plant control, lives often depend on the correct functionality of software systems. One of the most challenging tasks of system developers is to ensure that the system both delivers the expected functionalities and is resilient to failures. We advocate the use of a combination of elements from Model Driven Architecture (MDA) [1] and Service-Oriented Architectures (SOA) to disentangle functional aspects of system behavior from the treatment of failures. The basic building block of our approach is the service. Services capture interaction patterns between system entities. Our approach leverages the interaction descriptions captured by services to identify, at run time, deviations from the speciﬁed behavior. An ontology guides the identiﬁcation of failures and the activation of additional services that mitigate the eﬀects of such failures. F. Kordon and O. Sokolsky (Eds.): Monterey Workshop 2006, LNCS 4888, pp. 53–74, 2007. c Springer-Verlag Berlin Heidelberg 2007

54

V. Ermagan, I. Kr¨ uger, and M. Menarini

Fig. 1. Model-based failure-management approach

Figure 1 outlines the model-based failure management approach we propose. The ﬁgure shows the two main elements of our approach. We leverage an ontology, encompassing a failure taxonomy, service oriented models, deployment models and the mapping between them, to inform an MDA approach. We enrich the logical and deployment models typical of any MDA with a failure hypothesis. This additional artifact, based on the failure ontology, captures what physical and logical entities can fail in a system. It also provides a formal basis to reason about system correctness in presence of failures. Our SOA models are based on hierarchically composed interaction models, extending the servi

Data Loading...

Model-Based Failure Management for Distributed Reactive Systems

Recommend Documents

Transaction Management in Distributed Database Systems

Invited Paper: Reactive PLS for Distributed Decision

Distributed Systems for System Architects

Compressed Sensing for Distributed Systems

Holistic Analysis and Management of Distributed Social Systems

Modular Regression Verification for Reactive Systems

Distributed Shared Files Management

Distributed Deadlock Management

Structural Failure Models for Fault-Tolerant Distributed Computing

Distributed State Management

Controller Design for Distributed Parameter Systems

Scan for Distributed Memory, Message-Passing Systems