skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Diagnosing the Causes and Severity of One-sided Message Contention

Conference ·

Two trends suggest network contention for one-sided messages is poised to become a performance problem that concerns application developers: an increased interest in one-sided programming models and a rising ratio of hardware threads to network injection bandwidth. Unfortunately, it is difficult to reason about network contention and one-sided messages because one-sided tasks can either decrease or increase contention. We present effective and portable techniques for diagnosing the causes and severity of one-sided message contention. To detect that a message is affected by contention, we maintain statistics representing instantaneous (non-local) network resource demand. Using lightweight measurement and modeling, we identify the portion of a message's latency that is due to contention and whether contention occurs at the initiator or target. We attribute these metrics to program statements in their full static and dynamic context. We characterize contention for an important computational chemistry benchmark on InfiniBand, Cray Aries, and IBM Blue Gene/Q interconnects. We pinpoint the sources of contention, estimate their severity, and show that when message delivery time deviates from an ideal model, there are other messages contending for the same network links. With a small change to the benchmark, we reduce contention up to 50% and improve total runtime as much as 20%.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1178498
Report Number(s):
PNNL-SA-106916; KJ0402000
Resource Relation:
Conference: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '15), February 7-11, 2015, San Francisco, California, 130-139
Country of Publication:
United States
Language:
English

References (28)

Limits on interconnection network performance journal January 1991
LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation
  • Alexandrov, Albert; Ionescu, Mihai F.; Schauser, Klaus E.
  • Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures - SPAA '95 https://doi.org/10.1145/215399.215427
conference January 1995
The Gemini System Interconnect
  • Alverson, Robert; Roweth, Duncan; Kaplan, Larry
  • 2010 IEEE 18th Annual Symposium on High-Performance Interconnects (HOTI), 2010 18th IEEE Symposium on High Performance Interconnects https://doi.org/10.1109/HOTI.2010.23
conference August 2010
There goes the neighborhood: performance degradation due to nearby jobs
  • Bhatele, Abhinav; Mohror, Kathryn; Langer, Steven H.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503247
conference January 2013
Efficient algorithms for all-to-all communications in multiport message-passing systems journal January 1997
Parallel Programmability and the Chapel Language journal August 2007
An Efficient Format for Nearly Constant-Time Access to Arbitrary Time Intervals in Large Trace Files journal January 2008
Latency, occupancy, and bandwidth in dsm multiprocessors: a performance evaluation journal July 2003
The IBM Blue Gene/Q interconnection network and message unit
  • Chen, Dong; Parker, Jeffrey J.; Eisley, Noel A.
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063419
conference January 2011
Scalability analysis of SPMD codes using expectations conference January 2007
On the suitability of MPI as a PGAS runtime conference December 2014
The Future Fast Fourier Transform? journal January 1998
Cray Cascade: A scalable HPC system based on a Dragonfly network
  • Faanes, Greg; Bataineh, Abdulla; Roweth, Duncan
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.39
conference November 2012
Modeling the performance of an algebraic multigrid cycle on HPC platforms conference January 2011
Major Computer Science Challenges At Exascale journal September 2009
Predictive performance and scalability modeling of a large-scale application conference January 2001
A new vision for coarray Fortran
  • Mellor-Crummey, John; Adhianto, Laksono; Scherer, William N.
  • Proceedings of the Third Conference on Partitioned Global Address Space Programing Models - PGAS '09 https://doi.org/10.1145/1809961.1809969
conference January 2009
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit journal May 2006
Co-array Fortran for parallel programming journal August 1998
Warp speed: executing time warp on 1,966,080 cores
  • Barnes, Peter D.; Carothers, Christopher D.; Jefferson, David R.
  • Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13 https://doi.org/10.1145/2486092.2486134
conference January 2013
“Hot spot” contention and combining in multistage interconnection networks journal October 1985
The Tau Parallel Performance System journal May 2006
Improving communication performance in dense linear algebra via topology aware collectives
  • Solomonik, Edgar; Bhatele, Abhinav; Demmel, James
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063487
conference January 2011
ARGOS, a vectorized general molecular dynamics program journal September 1990
Statistical scalability analysis of communication operations in distributed applications
  • Vetter, Jeffrey S.; McCracken, Michael O.
  • Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming - PPoPP '01 https://doi.org/10.1145/379539.379590
conference January 2001
Designing scalable PGAS communication subsystems on cray gemini interconnect conference December 2012
Building Scalable PGAS Communication Subsystem on Blue Gene/Q
  • Vishnu, Abhinav; Kerbyson, Darren J.; Barker, Kevin
  • 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2013.262
conference May 2013
Efficient Pattern Search in Large Traces Through Successive Refinement book January 2004

Similar Records

Efficient Active Message RMA in GASNet Using a Target-Side Reassembly Protocol (Extended Abstract)
Conference · Sun Nov 17 00:00:00 EST 2019 · OSTI ID:1178498

Reaching bandwidth saturation using transparent injection parallelization
Journal Article · Wed Oct 05 00:00:00 EDT 2016 · International Journal of High Performance Computing Applications · OSTI ID:1178498

Reaching bandwidth saturation using transparent injection parallelization
Journal Article · Wed Nov 09 00:00:00 EST 2016 · International Journal of High Performance Computing Applications · OSTI ID:1178498