Software health management with Bayesian networks

  • PDF / 807,333 Bytes
  • 22 Pages / 595.276 x 790.866 pts Page_size
  • 22 Downloads / 214 Views

DOWNLOAD

REPORT


SI: SwHM

Software health management with Bayesian networks Johann Schumann · Timmy Mbaya · Ole Mengshoel · Knot Pipatsrisawat · Ashok Srivastava · Arthur Choi · Adnan Darwiche

Received: 7 February 2012 / Accepted: 10 May 2013 © Springer-Verlag London (outside the USA) 2013

Abstract Software health management (SWHM) is an emerging field which addresses the critical need to detect, diagnose, predict, and mitigate adverse events due to software faults and failures. These faults could arise for numerous reasons including coding errors, unanticipated faults or failures in hardware, or problematic interactions with the external environment. This paper demonstrates a novel approach to software health management based on a rigorous Bayesian formulation that monitors the behavior of software and operating system, performs probabilistic diagnosis, and provides information about the most likely root causes of a failure or software problem. Translation of the Bayesian network model into an efficient data structure, an arithmetic circuit, makes it possible to perform SWHM on resourceJ. Schumann (B) SGT, Inc., NASA Ames Research Center, Moffett Field, CA, USA e-mail: [email protected] T. Mbaya RIACS/USRA, Mountain View, CA, USA e-mail: [email protected] O. Mengshoel Carnegie Mellon University, Silicon Valley Campus, Moffett Field, CA, USA e-mail: [email protected] K. Pipatsrisawat · A. Choi · A. Darwiche University of California, Los Angeles, CA, USA e-mail: [email protected] A. Srivastava NASA Ames Research Center, Moffett Field, CA, USA e-mail: [email protected] A. Choi e-mail: [email protected] A. Darwiche e-mail: [email protected]

restricted embedded computing platforms as found in aircraft, unmanned aircraft, or satellites. SWHM is especially important for safety critical systems such as aircraft control systems. In this paper, we demonstrate our Bayesian SWHM system on three realistic scenarios from an aircraft control system: (1) aircraft file-system based faults, (2) signal handling faults, and (3) navigation faults due to inertial measurement unit (IMU) failure or compromised Global Positioning System (GPS) integrity. We show that the method successfully detects and diagnoses faults in these scenarios. We also discuss the importance of verification and validation of SWHM systems. Keywords Software health management · Fault detection and diagnosis · Aircraft control system · Bayesian network · Probabilistic diagnosis

1 Introduction: software health management Modern aircraft increasingly rely on highly safety critical functions (e.g., aircraft control, auto-throttle, autopilot, communications) implemented in software for digital control (fly-by-wire). Despite strict rules for certification (e.g., DO178C [66] for civil aviation) and immense efforts to perform verification and validation (V&V) on the software during its development, software failures occur, threatening mission, safety and life of passengers and crew. Such failures are typically caused by latent bugs in the code or unex