skip to main content

Title: Final Report: Migration Mechanisms for Large-scale Parallel Applications

Process migration is the ability to transfer a process from one machine to another. It is a useful facility in distributed computing environments, especially as computing devices become more pervasive and Internet access becomes more ubiquitous. The potential benefits of process migration, among others, are fault resilience by migrating processes off of faulty hosts, data access locality by migrating processes closer to the data, better system response time by migrating processes closer to users, dynamic load balancing by migrating processes to less loaded hosts, and improved service availability and administration by migrating processes before host maintenance so that applications can continue to run with minimal downtime. Although process migration provides substantial potential benefits and many approaches have been considered, achieving transparent process migration functionality has been difficult in practice. To address this problem, our work has designed, implemented, and evaluated new and powerful transparent process checkpoint-restart and migration mechanisms for desktop, server, and parallel applications that operate across heterogeneous cluster and mobile computing environments. A key aspect of this work has been to introduce lightweight operating system virtualization to provide processes with private, virtual namespaces that decouple and isolate processes from dependencies on the host operating system instance. This decouplingmore » enables processes to be transparently checkpointed and migrated without modifying, recompiling, or relinking applications or the operating system. Building on this lightweight operating system virtualization approach, we have developed novel technologies that enable (1) coordinated, consistent checkpoint-restart and migration of multiple processes, (2) fast checkpointing of process and file system state to enable restart of multiple parallel execution environments and time travel, (3) process migration across heterogeneous software environments, (4) network checkpoint-restart and migration of distributed and parallel applications, (5) a utility computing infrastructure for mobile desktop cloud computing based on process checkpoint-restart and migration functionality, (6) a process migration security architecture for protecting applications and infrastructure from denial-of-service attacks, and (7) a checkpoint-restart mobile computing system using portable storage devices.« less
Authors:
Publication Date:
OSTI Identifier:
966698
Report Number(s):
DE-FG02-03ER25562 - Final Report
TRN: US201110%%194
DOE Contract Number:
FG02-03ER25562
Resource Type:
Technical Report
Research Org:
Columbia University
Sponsoring Org:
USDOE Office of Science (SC)
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; COMPUTER ARCHITECTURE; AVAILABILITY; COMPUTERS; DECOUPLING; DYNAMIC LOADS; INTERNET; MAINTENANCE; SECURITY; STORAGE