skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Path to Operating System and Runtime Support for Extreme Scale Tools

Technical Report ·
DOI:https://doi.org/10.2172/1072945· OSTI ID:1072945

In this project, we cast distributed resource access as operations on files in a global name space and developed a common, scalable solution for group operations on distributed processes and files. The resulting solution enables tool and middleware developers to quickly create new scalable software or easily improve the scalability of existing software. The cornerstone of the project was the design of a new programming idiom called group file operations that eliminates iterative behavior when a single process must apply the same set of file operations to a group of related files. To demonstrate our novel and scalable ideas for group file operations and global name space composition, we developed a group file system called TBON-FS that leverages a tree-based overlay network (TBON), specifically MRNet, for logarithmic communication and distributed data aggregation. We also developed proc++, a new synthetic file system co-designed for use in scalable group file operations. Over the course of the project, we evaluated the utility and performance of group file operations, global name space composition, TBON-FS, and proc++ in three case studies. The first study focused on the ease in using group file operations and TBON-FS to quickly develop several new scalable tools for distributed system administration and monitoring. The second study evaluated the integration of group file operation and TBON-FS within the Ganglia Distributed Monitoring System to improve its scalability for clusters. The final study involved the integration of group file operations, TBON-FS, and proc++ within TotalView, the widely-used parallel debugger. For this project, the work of the Oak Ridge National Laboratory (ORNL) team occurred primarily in two directions: bringing the MRNet tree-based overlay network (TBON) implementation to the Cray XT platform, and investigating techniques for predicting the performance of MRNet topologies on such systems. Rogue Wave Software (RWS), formerly TotalView Technologies Inc., worked with the University ofWisconsin (UW) team on the design and prototyping of a scalable version of the TotalView debugger. RWS assisted UW with their "proc++" design effort. RWS assisted UW with strategy for integrating proc++ into TotalView.

Research Organization:
TotalView Technologies LLC
Sponsoring Organization:
USDOE; USDOE SC Office of Advanced Scientific Computing Research (SC-21)
DOE Contract Number:
FG02-08ER25843
OSTI ID:
1072945
Report Number(s):
DE-PS02-08ER25843
Country of Publication:
United States
Language:
English

Similar Records

Lightweight and Statistical Techniques for Petascale PetaScale Debugging
Technical Report · Mon Jun 30 00:00:00 EDT 2014 · OSTI ID:1072945

Purple L1 Milestone Review Panel TotalView Debugger Functionality and Performance for ASC Purple
Technical Report · Tue Dec 12 00:00:00 EST 2006 · OSTI ID:1072945

Stack Trace Analysis Tool (STAT)
Software · Tue Jan 16 00:00:00 EST 2018 · OSTI ID:1072945

Related Subjects