Overcoming Scalability Challenges for Tool Daemon Launching
Many tools that target parallel and distributed environments must co-locate a set of daemons with the distributed processes of the target application. However, efficient and portable deployment of these daemons on large scale systems is an unsolved problem. We overcome this gap with LaunchMON, a scalable, robust, portable, secure, and general purpose infrastructure for launching tool daemons. Its API allows tool builders to identify all processes of a target job, launch daemons on the relevant nodes and control daemon interaction. Our results show that Launch-MON scales to very large daemon counts and substantially enhances performance over existing ad hoc mechanisms.
- Research Organization:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 944365
- Report Number(s):
- LLNL-CONF-401480; TRN: US200902%%726
- Resource Relation:
- Conference: Presented at: INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, Portland, OR, United States, Sep 08 - Sep 12, 2008
- Country of Publication:
- United States
- Language:
- English
Similar Records
LaunchMON: An Infrastructue for Large Scale Tool Daemon Launching
Scalable resource management in high performance computers.
The PetscSF Scalable Communication Layer
Software
·
Wed Apr 25 00:00:00 EDT 2001
·
OSTI ID:944365
+2 more
Scalable resource management in high performance computers.
Conference
·
Tue Jan 01 00:00:00 EST 2002
·
OSTI ID:944365
+1 more
The PetscSF Scalable Communication Layer
Journal Article
·
Wed May 26 00:00:00 EDT 2021
· IEEE Transactions on Parallel and Distributed Systems
·
OSTI ID:944365
+7 more