The design and implementation of Berkeley Lab's linuxcheckpoint/restart
This paper describes Berkeley Linux Checkpoint/Restart (BLCR), a linux kernel module that allows system-level checkpoints on a variety of Linux systems. BLCR can be used either as a stand alone system for checkpointing applications on a single machine, or as a component by a scheduling system or parallel communication library for checkpointing and restoring parallel jobs running on multiple machines. Integration with Message Passing Interface (MPI) and other parallel systems is described.
- Research Organization:
- Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
- Sponsoring Organization:
- USDOE Director. Office of Science. Office of AdvancedScientific Computing Research
- DOE Contract Number:
- AC02-05CH11231
- OSTI ID:
- 891617
- Report Number(s):
- LBNL--54941; BnR: KJ0101030
- Country of Publication:
- United States
- Language:
- English
Similar Records
Berkeley Lab Checkpoint/Restart for Linux
Berkeley lab checkpoint/restart (BLCR) for Linux clusters
Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters
Software
·
Fri Nov 14 19:00:00 EST 2003
·
OSTI ID:code-54577
Berkeley lab checkpoint/restart (BLCR) for Linux clusters
Journal Article
·
Fri Sep 01 00:00:00 EDT 2006
· Journal of Physics. Conference Series
·
OSTI ID:1407049
Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters
Journal Article
·
Wed Jul 26 00:00:00 EDT 2006
· Journal of Physcs: Conference Series
·
OSTI ID:926560