Diagnosing the Causes and Severity of One-sided Message Contention
Two trends suggest network contention for one-sided messages is poised to become a performance problem that concerns application developers: an increased interest in one-sided programming models and a rising ratio of hardware threads to network injection bandwidth. Unfortunately, it is difficult to reason about network contention and one-sided messages because one-sided tasks can either decrease or increase contention. We present effective and portable techniques for diagnosing the causes and severity of one-sided message contention. To detect that a message is affected by contention, we maintain statistics representing instantaneous (non-local) network resource demand. Using lightweight measurement and modeling, we identify the portion of a message's latency that is due to contention and whether contention occurs at the initiator or target. We attribute these metrics to program statements in their full static and dynamic context. We characterize contention for an important computational chemistry benchmark on InfiniBand, Cray Aries, and IBM Blue Gene/Q interconnects. We pinpoint the sources of contention, estimate their severity, and show that when message delivery time deviates from an ideal model, there are other messages contending for the same network links. With a small change to the benchmark, we reduce contention up to 50% and improve total runtime as much as 20%.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1178498
- Report Number(s):
- PNNL-SA-106916; KJ0402000
- Resource Relation:
- Conference: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '15), February 7-11, 2015, San Francisco, California, 130-139
- Country of Publication:
- United States
- Language:
- English
Limits on interconnection network performance
|
journal | January 1991 |
LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation
|
conference | January 1995 |
The Gemini System Interconnect
|
conference | August 2010 |
There goes the neighborhood: performance degradation due to nearby jobs
|
conference | January 2013 |
Efficient algorithms for all-to-all communications in multiport message-passing systems
|
journal | January 1997 |
Parallel Programmability and the Chapel Language
|
journal | August 2007 |
An Efficient Format for Nearly Constant-Time Access to Arbitrary Time Intervals in Large Trace Files
|
journal | January 2008 |
Latency, occupancy, and bandwidth in dsm multiprocessors: a performance evaluation
|
journal | July 2003 |
The IBM Blue Gene/Q interconnection network and message unit
|
conference | January 2011 |
Scalability analysis of SPMD codes using expectations
|
conference | January 2007 |
On the suitability of MPI as a PGAS runtime
|
conference | December 2014 |
The Future Fast Fourier Transform?
|
journal | January 1998 |
Cray Cascade: A scalable HPC system based on a Dragonfly network
|
conference | November 2012 |
Modeling the performance of an algebraic multigrid cycle on HPC platforms
|
conference | January 2011 |
Major Computer Science Challenges At Exascale
|
journal | September 2009 |
Predictive performance and scalability modeling of a large-scale application
|
conference | January 2001 |
A new vision for coarray Fortran
|
conference | January 2009 |
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
|
journal | May 2006 |
Co-array Fortran for parallel programming
|
journal | August 1998 |
Warp speed: executing time warp on 1,966,080 cores
|
conference | January 2013 |
“Hot spot” contention and combining in multistage interconnection networks
|
journal | October 1985 |
The Tau Parallel Performance System
|
journal | May 2006 |
Improving communication performance in dense linear algebra via topology aware collectives
|
conference | January 2011 |
ARGOS, a vectorized general molecular dynamics program
|
journal | September 1990 |
Statistical scalability analysis of communication operations in distributed applications
|
conference | January 2001 |
Designing scalable PGAS communication subsystems on cray gemini interconnect
|
conference | December 2012 |
Building Scalable PGAS Communication Subsystem on Blue Gene/Q
|
conference | May 2013 |
Efficient Pattern Search in Large Traces Through Successive Refinement
|
book | January 2004 |
Similar Records
Reaching bandwidth saturation using transparent injection parallelization
Reaching bandwidth saturation using transparent injection parallelization