The design and implementation of Berkeley Lab's linuxcheckpoint/restart
This paper describes Berkeley Linux Checkpoint/Restart (BLCR), a linux kernel module that allows system-level checkpoints on a variety of Linux systems. BLCR can be used either as a stand alone system for checkpointing applications on a single machine, or as a component by a scheduling system or parallel communication library for checkpointing and restoring parallel jobs running on multiple machines. Integration with Message Passing Interface (MPI) and other parallel systems is described.
- Research Organization:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Director. Office of Science. Office of AdvancedScientific Computing Research
- DOE Contract Number:
- DE-AC02-05CH11231
- OSTI ID:
- 891617
- Report Number(s):
- LBNL-54941; R&D Project: KS3210; BnR: KJ0101030; TRN: US200622%%259
- Country of Publication:
- United States
- Language:
- English
Similar Records
Berkeley Lab Checkpoint/Restart for Linux
Berkeley lab checkpoint/restart (BLCR) for Linux clusters
Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters
Software
·
Sat Nov 15 00:00:00 EST 2003
·
OSTI ID:891617
Berkeley lab checkpoint/restart (BLCR) for Linux clusters
Journal Article
·
Fri Sep 01 00:00:00 EDT 2006
· Journal of Physics. Conference Series
·
OSTI ID:891617
Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters
Journal Article
·
Wed Jul 26 00:00:00 EDT 2006
· Journal of Physcs: Conference Series
·
OSTI ID:891617