A Path to Operating System and Runtime Support for Extreme Scale Tools
In this project, we cast distributed resource access as operations on files in a global name space and developed a common, scalable solution for group operations on distributed processes and files. The resulting solution enables tool and middleware developers to quickly create new scalable software or easily improve the scalability of existing software. The cornerstone of the project was the design of a new programming idiom called group file operations that eliminates iterative behavior when a single process must apply the same set of file operations to a group of related files. To demonstrate our novel and scalable ideas for group file operations and global name space composition, we developed a group file system called TBON-FS that leverages a tree-based overlay network (TBON), specifically MRNet, for logarithmic communication and distributed data aggregation. We also developed proc++, a new synthetic file system co-designed for use in scalable group file operations. Over the course of the project, we evaluated the utility and performance of group file operations, global name space composition, TBON-FS, and proc++ in three case studies. The first study focused on the ease in using group file operations and TBON-FS to quickly develop several new scalable tools for distributed system administration and monitoring. The second study evaluated the integration of group file operation and TBON-FS within the Ganglia Distributed Monitoring System to improve its scalability for clusters. The final study involved the integration of group file operations, TBON-FS, and proc++ within TotalView, the widely-used parallel debugger. For this project, the work of the Oak Ridge National Laboratory (ORNL) team occurred primarily in two directions: bringing the MRNet tree-based overlay network (TBON) implementation to the Cray XT platform, and investigating techniques for predicting the performance of MRNet topologies on such systems. Rogue Wave Software (RWS), formerly TotalView Technologies Inc., worked with the University ofWisconsin (UW) team on the design and prototyping of a scalable version of the TotalView debugger. RWS assisted UW with their "proc++" design effort. RWS assisted UW with strategy for integrating proc++ into TotalView.
- Research Organization:
- TotalView Technologies LLC
- Sponsoring Organization:
- USDOE; USDOE SC Office of Advanced Scientific Computing Research (SC-21)
- DOE Contract Number:
- FG02-08ER25843
- OSTI ID:
- 1072945
- Report Number(s):
- DE-PS02-08ER25843
- Country of Publication:
- United States
- Language:
- English
Similar Records
Purple L1 Milestone Review Panel TotalView Debugger Functionality and Performance for ASC Purple
Stack Trace Analysis Tool (STAT)