On the memory attribution problem: A solution and case study using MPI
Abstract
As parallel applications running on large–scale computing systems become increasingly memory constrained, the ability to attribute memory usage to the various components of the application is becoming increasingly important. We present the design and implementation of memnesia, a novel memory usage profiler for parallel and distributed message–passing applications. Our approach captures both application– and message–passing library–specific memory usage statistics from unmodified binaries dynamically linked to a message–passing communication library. Using microbenchmarks and proxy applications, we evaluated our profiler across three Message Passing Interface (MPI) implementations and two hardware platforms. Furthermore, the results show that our approach and the corresponding implementation can accurately quantify memory resource usage as a function of time, scale, communication workload, and software or hardware system architecture, clearly distinguishing between application and MPI library memory usage at a per–process level. With this new capability, we show that job size, communication workload, and hardware/software architecture influence peak runtime memory usage. In practice, this tool provides a potentially valuable source of information for application developers seeking to measure and optimize memory usage.
- Authors:
-
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Univ. of New Mexico, Albuquerque, NM (United States)
- Emory Univ., Atlanta, GA (United States)
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Publication Date:
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)
- OSTI Identifier:
- 1495167
- Alternate Identifier(s):
- OSTI ID: 1493495
- Report Number(s):
- LA-UR-18-30292
Journal ID: ISSN 1532-0626
- Grant/Contract Number:
- 89233218CNA000001; AC52‐06NA25396
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Concurrency and Computation. Practice and Experience
- Additional Journal Information:
- Journal Volume: 32; Journal Issue: 3; Journal ID: ISSN 1532-0626
- Publisher:
- Wiley
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Computer Science
Citation Formats
Gutiérrez, Samuel Keith, Arnold, Dorian C., Davis, Kei Marion, and McCormick, Patrick Sean. On the memory attribution problem: A solution and case study using MPI. United States: N. p., 2019.
Web. doi:10.1002/cpe.5159.
Gutiérrez, Samuel Keith, Arnold, Dorian C., Davis, Kei Marion, & McCormick, Patrick Sean. On the memory attribution problem: A solution and case study using MPI. United States. https://doi.org/10.1002/cpe.5159
Gutiérrez, Samuel Keith, Arnold, Dorian C., Davis, Kei Marion, and McCormick, Patrick Sean. Mon .
"On the memory attribution problem: A solution and case study using MPI". United States. https://doi.org/10.1002/cpe.5159. https://www.osti.gov/servlets/purl/1495167.
@article{osti_1495167,
title = {On the memory attribution problem: A solution and case study using MPI},
author = {Gutiérrez, Samuel Keith and Arnold, Dorian C. and Davis, Kei Marion and McCormick, Patrick Sean},
abstractNote = {As parallel applications running on large–scale computing systems become increasingly memory constrained, the ability to attribute memory usage to the various components of the application is becoming increasingly important. We present the design and implementation of memnesia, a novel memory usage profiler for parallel and distributed message–passing applications. Our approach captures both application– and message–passing library–specific memory usage statistics from unmodified binaries dynamically linked to a message–passing communication library. Using microbenchmarks and proxy applications, we evaluated our profiler across three Message Passing Interface (MPI) implementations and two hardware platforms. Furthermore, the results show that our approach and the corresponding implementation can accurately quantify memory resource usage as a function of time, scale, communication workload, and software or hardware system architecture, clearly distinguishing between application and MPI library memory usage at a per–process level. With this new capability, we show that job size, communication workload, and hardware/software architecture influence peak runtime memory usage. In practice, this tool provides a potentially valuable source of information for application developers seeking to measure and optimize memory usage.},
doi = {10.1002/cpe.5159},
journal = {Concurrency and Computation. Practice and Experience},
number = 3,
volume = 32,
place = {United States},
year = {Mon Feb 04 00:00:00 EST 2019},
month = {Mon Feb 04 00:00:00 EST 2019}
}
Web of Science
Works referenced in this record:
Valgrind: a framework for heavyweight dynamic binary instrumentation
journal, June 2007
- Nethercote, Nicholas; Seward, Julian
- ACM SIGPLAN Notices, Vol. 42, Issue 6
Memory registration caching correctness
conference, January 2005
- Wyckoff, P.; Wu, J.
- CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005.
Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation
book, January 2004
- Gabriel, Edgar; Fagg, Graham E.; Bosilca, George
- Recent Advances in Parallel Virtual Machine and Message Passing Interface
A uGNI-Based MPICH2 Nemesis Network Module for the Cray XE
book, January 2011
- Pritchard, Howard; Gorodetsky, Igor; Buntinas, Darius
- Recent Advances in the Message Passing Interface
Technology-Driven, Highly-Scalable Dragonfly Topology
conference, June 2008
- Kim, John; Dally, Wiliam J.; Scott, Steve
- 2008 35th International Symposium on Computer Architecture (ISCA), 2008 International Symposium on Computer Architecture
A high-performance, portable implementation of the MPI message passing interface standard
journal, September 1996
- Gropp, William; Lusk, Ewing; Doss, Nathan
- Parallel Computing, Vol. 22, Issue 6
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
conference, January 2003
- Roth, Philip C.; Arnold, Dorian C.; Miller, Barton P.
- Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03
Improving the reliability of commodity operating systems
journal, February 2005
- Swift, Michael M.; Bershad, Brian N.; Levy, Henry M.
- ACM Transactions on Computer Systems, Vol. 23, Issue 1
Technology-Driven, Highly-Scalable Dragonfly Topology
journal, June 2008
- Kim, John; Dally, Wiliam J.; Scott, Steve
- ACM SIGARCH Computer Architecture News, Vol. 36, Issue 3
Improving the reliability of commodity operating systems
conference, January 2003
- Swift, Michael M.; Bershad, Brian N.; Levy, Henry M.
- Proceedings of the nineteenth ACM symposium on Operating systems principles - SOSP '03