Transient and Permanent Error Control for Networks-on-Chip

This book addresses reliability and energy efficiency of on-chip networks using a configurable error control coding (ECC) scheme for datalink-layer transient error management. The method can adjust both error detection and correction strengths at runtime

  • PDF / 6,525,407 Bytes
  • 166 Pages / 439.37 x 666.142 pts Page_size
  • 107 Downloads / 196 Views

DOWNLOAD

REPORT


Qiaoyan Yu

l

Paul Ampadu

Transient and Permanent Error Control for Networks-on-Chip

Qiaoyan Yu University of New Hampshire Durham, NH 03824, USA [email protected]

Paul Ampadu University of Rochester Rochester, NY 14627, USA [email protected]

ISBN 978-1-4614-0961-8 e-ISBN 978-1-4614-0962-5 DOI 10.1007/978-1-4614-0962-5 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2011939749 # Springer Science+Business Media, LLC 2012

All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Preface

Reliability has become one of the most important metrics for on-chip communications infrastructures in nanoscale technologies. Reduced supply voltages and high clock frequencies exacerbate the impact of noise sources such as particle strikes and crosstalk, which can cause transient errors in transmitted data. Additionally, manufacturing defects, electromigration, and aging can cause permanent errors in communication links. Unfortunately, transient and permanent error management techniques typically result in increased power consumption, latency and area overhead, further challenging large-scale system design. Consequently, cost-effective techniques for improving onchip error resilience are needed. The purpose of this book is to address the reliability and energy issues of nanoscale on-chip networks. Since the noise environment is not constant in real applications, the worst-case design approach often used results in wasted energy, particularly when the noise condition is favorable. To address the variable error rates, we present a configurable error control coding (ECC) scheme for datalink-layer transient error management. The method can adjust both error detection and correction strengths at runtime by varying the number of redundant wires for parity-check bits. To further improve energy efficiency, the adaptation on ECC is extended to the network layer. We demonstrate that the proposed dual-layer cooperative error control achieves better reliability, latency, and energy efficiency than other solutions in a wide range of noise and traffic conditions, at moderate area costs. We further extend these methods to tackle joint transient and permanent error correction, exploiting redundant resources already available. This approach r