Fast Poisson Solvers

  • PDF / 3,271,130 Bytes
  • 109 Pages / 565.087 x 755.008 pts Page_size
  • 47 Downloads / 246 Views

DOWNLOAD

REPORT


Fast Multipole Method (FMM) N-Body Computational Methods

Fast Poisson Solvers Rapid Elliptic Solvers

Fat Tree Networks, Multistage

Fault Tolerance Hans P. Zima, Allen Nikora California Institute of Technology, Pasadena, CA, USA

Definition Fault tolerance denotes the capability of a system to adhere to its specification and deliver correct service in the presence of faults.

Discussion Introduction Modern society relies increasingly on highly complex computing systems across a wide spectrum of applications, ranging from commercial transactions to the control of transportation and communication systems,

the electrical power grid, nuclear reactors, and space missions. In many cases, human lives and huge material values depend on the well-functioning of these systems even under adverse conditions. Despite the successes in verification and validation technology over past decades, theoretical as well as practical studies convincingly demonstrate that large systems typically do contain design faults, even when subject to the strictest development disciplines. Moreover, even a perfectly designed system may be subject to external faults, such as radiation effects and operator errors. As a consequence, it is essential to provide methods that avoid system failure and maintain the functionality of a system, possibly with degraded performance, even in the case of faults. This is called fault tolerance. A fault is defined as a defect in a system that may cause an error during its operation. If an error affects the service to be provided by a system, a failure occurs. Fault-tolerant systems were built long before the advent of the digital computer, based on the use of replication, diversified design, and federation of equipment. In an article on Babbage’s difference engine published in , Dionysius Lardner wrote []:“The most certain and effectual check upon errors which arise in the process of computation is to cause the same computations to be made by separate and independent computers; and this check is rendered still more decisive if they make their computation by different methods.” Early fault-tolerant computers include NASA’s Self-Testing-and-Repairing (STAR) computer, developed for a -year mission to the outer planets in the s, and the computers onboard the Jet Propulsion Laboratory’s Voyager systems. Today, highly sophisticated fault-tolerant computing systems control the new generation of fly-by-wire aircraft, such as the Airbus and Boeing airliners, protecting against design as well as hardware faults. Perhaps the most widespread use of fault-tolerant computing has been in the area of commercial transactions systems, such as automatic teller machines and airline reservation systems.

David Padua (ed.), Encyclopedia of Parallel Computing, DOI ./----, © Springer Science+Business Media, LLC 



F

Fault Tolerance

The focus of this article is on software methods for dealing with faults that may arise in hardware or software systems. Before entering into a discussion on fault tol