Software Design for Resilient Computer Systems

This book addresses the question of how system software should be designed to account for faults, and which fault tolerance features it should provide for highest reliability. The authors first show how the system software interacts with the hardware to t

  • PDF / 7,600,951 Bytes
  • 218 Pages / 467.717 x 683.15 pts Page_size
  • 95 Downloads / 213 Views

DOWNLOAD

REPORT


oftware Design for Resilient Computer Systems

Software Design for Resilient Computer Systems

Igor Schagaev · Thomas Kaegi-Trachsel

Software Design for Resilient Computer Systems

13

Igor Schagaev IT-ACS Ltd. Stevenage UK

Thomas Kaegi-Trachsel IT-ACS Ltd. Stevenage UK

ISBN 978-3-319-29463-6 ISBN 978-3-319-29465-0  (eBook) DOI 10.1007/978-3-319-29465-0 Library of Congress Control Number: 2016930674 © Springer International Publishing Switzerland 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer imprint is published by SpringerNature The registered company is Springer International Publishing AG Switzerland

Preface

Nowadays computer systems are applied in safety critical areas such as military, aviation, intensive health care, industrial control, space exploration, etc. All these areas demand the highest possible reliability of functional operation. However, ionized particles and radiation, thermal instability, and various external factors—all impact on current semiconductor hardware and this leads inevitably to faults in the system. It is expected that such phenomena will be observed more often in the future because of the ongoing miniaturization of hardware structures. In this book, we want to tackle the question of how system software should be designed to support, handle, and recover hardware in the event of such faults, and which fault tolerance schemes system software should provide for the highest reliability. We also show how the system software interacts with the hardware to tolerate these faults. First, we analyze and further develop the theory of fault tolerance to understand the ways to increase the reliability of a system. Ultimately, the key is to use redundancy in all its different appearances. We revise and further develop the general algorithm of fault tolerance (GAFT) with its three main processes of hardware checking, preparation for recovery and recovery procedure, and explain how our approach to the desi