Techniques for simplifying the programming of distributed systems
It is difficult to design and verify distributed programs that execute correctly despite transient processor failures, or despite variable and unpredictable processor speeds, and message transmission times. This thesis describes a check pointing/rollback mechanism that allows programmers to write distributed programs with the simplifying assumption that processors do not fail, and then run these programs correctly on systems with transient processor failures. Also described is a translation mechanisms that can be used to write programs with the simplifying assumptions that processors execute in synchronized steps and messages take exactly one step to arrive, and then run these programs correctly on systems that violate these assumptions. Both mechanisms are transparent to the programmer, and they can be applied to solve a large class of problems.
- Research Organization:
- Cornell Univ., Ithaca, NY (USA)
- OSTI ID:
- 6917508
- Country of Publication:
- United States
- Language:
- English
Similar Records
Parallel language constructs for paradigm integration and deterministic computations
Consistent state detection and recovery for concurrent processing