Compiler-Enhanced Incremental Checkpointing for OpenMP Applications
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety of causes. Checkpointing is a popular technique for tolerating such failures, enabling applications to periodically save their state and restart computation after a failure. Although a variety of automated system-level checkpointing solutions are currently available to HPC users, manual application-level checkpointing remains more popular due to its superior performance. This paper improves performance of automated checkpointing via a compiler analysis for incremental checkpointing. This analysis, which works with both sequential and OpenMP applications, reduces checkpoint sizes by as much as 80% and enables asynchronous checkpointing.
- Research Organization:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 944289
- Report Number(s):
- LLNL-CONF-400662; TRN: US200902%%629
- Resource Relation:
- Conference: Presented at: International Conference on Supercomputing, Kos, Greece, Jun 07 - Jun 12, 2008
- Country of Publication:
- United States
- Language:
- English
Similar Records
Parallelization and checkpointing of GPU applications through program transformation
...And Eat it Too: High Read Performance in Write-Optimized HPC I/O Middleware File Formats