skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: High-reliability computing for the smarter planet

Conference ·
DOI:https://doi.org/10.1063/1.3586185· OSTI ID:1026827
 [1];  [1];  [2];  [3];  [4]
  1. Los Alamos National Laboratory
  2. UNIV OF PADOVA
  3. UNIV OF PENN
  4. INTEL CORPORATION

The geometric rate of improvement of transistor size and integrated circuit performance, known as Moore's Law, has been an engine of growth for our economy, enabling new products and services, creating new value and wealth, increasing safety, and removing menial tasks from our daily lives. Affordable, highly integrated components have enabled both life-saving technologies and rich entertainment applications. Anti-lock brakes, insulin monitors, and GPS-enabled emergency response systems save lives. Cell phones, internet appliances, virtual worlds, realistic video games, and mp3 players enrich our lives and connect us together. Over the past 40 years of silicon scaling, the increasing capabilities of inexpensive computation have transformed our society through automation and ubiquitous communications. In this paper, we will present the concept of the smarter planet, how reliability failures affect current systems, and methods that can be used to increase the reliable adoption of new automation in the future. We will illustrate these issues using a number of different electronic devices in a couple of different scenarios. Recently IBM has been presenting the idea of a 'smarter planet.' In smarter planet documents, IBM discusses increased computer automation of roadways, banking, healthcare, and infrastructure, as automation could create more efficient systems. A necessary component of the smarter planet concept is to ensure that these new systems have very high reliability. Even extremely rare reliability problems can easily escalate to problematic scenarios when implemented at very large scales. For life-critical systems, such as automobiles, infrastructure, medical implantables, and avionic systems, unmitigated failures could be dangerous. As more automation moves into these types of critical systems, reliability failures will need to be managed. As computer automation continues to increase in our society, the need for greater radiation reliability is necessary. Already critical infrastructure is failing too frequently. In this paper, we will introduce the Cross-Layer Reliability concept for designing more reliable computer systems.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC52-06NA25396
OSTI ID:
1026827
Report Number(s):
LA-UR-10-05313; LA-UR-10-5313; TRN: US1105141
Resource Relation:
Conference: 21st Int'l Conf on the Application of Accelerators in Research and Industry ; August 9, 2010 ; Fort Worth, TX
Country of Publication:
United States
Language:
English