Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Implementation and Evaluation of a Scalable Application-level Checkpoint-Recovery Scheme for MPI
 

Summary: Implementation and Evaluation of a Scalable
Application-level Checkpoint-Recovery Scheme for MPI
Programs
Martin Schulz

Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
Livermore, CA 94551
schulzm@llnl.gov
Greg Bronevetsky, Rohit Fernandes, Daniel Marques,Keshav Pingali, Paul Stodghill

Department of Computer Science
Cornell University
Ithaca, NY 14853
{bronevet,rohitf,marques,pingali,stodghil}@cs.cornell.edu
ABSTRACT
The running times of many computational science applications are
much longer than the mean-time-to-failure of current high-perfor-
mance computing platforms. To run to completion, such applica-
tions must tolerate hardware failures.

  

Source: Agrawal, Gagan - Department of Computer Science and Engineering, Ohio State University

 

Collections: Computer Technologies and Information Sciences