Diagnosing the Causes and Severity of One-sided Message Contention

Tallent, Nathan R.; Vishnu, Abhinav; van Dam, Hubertus; Daily, Jeffrey A.; Kerbyson, Darren J.; Hoisie, Adolfy

doi:10.1145/2688500.2688516

Title: Diagnosing the Causes and Severity of One-sided Message Contention

Conference · Wed Feb 11 00:00:00 EST 2015

DOI:https://doi.org/10.1145/2688500.2688516· OSTI ID:1178498

Tallent, Nathan R.; Vishnu, Abhinav; van Dam, Hubertus; Daily, Jeffrey A.; Kerbyson, Darren J.; Hoisie, Adolfy

Two trends suggest network contention for one-sided messages is poised to become a performance problem that concerns application developers: an increased interest in one-sided programming models and a rising ratio of hardware threads to network injection bandwidth. Unfortunately, it is difficult to reason about network contention and one-sided messages because one-sided tasks can either decrease or increase contention. We present effective and portable techniques for diagnosing the causes and severity of one-sided message contention. To detect that a message is affected by contention, we maintain statistics representing instantaneous (non-local) network resource demand. Using lightweight measurement and modeling, we identify the portion of a message's latency that is due to contention and whether contention occurs at the initiator or target. We attribute these metrics to program statements in their full static and dynamic context. We characterize contention for an important computational chemistry benchmark on InfiniBand, Cray Aries, and IBM Blue Gene/Q interconnects. We pinpoint the sources of contention, estimate their severity, and show that when message delivery time deviates from an ideal model, there are other messages contending for the same network links. With a small change to the benchmark, we reduce contention up to 50% and improve total runtime as much as 20%.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1178498

Report Number(s):: PNNL-SA-106916; KJ0402000

Resource Relation:: Conference: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '15), February 7-11, 2015, San Francisco, California, 130-139

Country of Publication:: United States

Language:: English

References (28)

Limits on interconnection network performance Agarwal, A. IEEE Transactions on Parallel and Distributed Systems, Vol. 2, Issue 4 https://doi.org/10.1109/71.97897	journal	January 1991
LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation Alexandrov, Albert; Ionescu, Mihai F.; Schauser, Klaus E. Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures - SPAA '95 https://doi.org/10.1145/215399.215427	conference	January 1995
The Gemini System Interconnect Alverson, Robert; Roweth, Duncan; Kaplan, Larry 2010 IEEE 18th Annual Symposium on High-Performance Interconnects (HOTI), 2010 18th IEEE Symposium on High Performance Interconnects https://doi.org/10.1109/HOTI.2010.23	conference	August 2010
There goes the neighborhood: performance degradation due to nearby jobs Bhatele, Abhinav; Mohror, Kathryn; Langer, Steven H. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503247	conference	January 2013
Efficient algorithms for all-to-all communications in multiport message-passing systems Bruck, J.; Kipnis, S. IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Issue 11 https://doi.org/10.1109/71.642949	journal	January 1997
Parallel Programmability and the Chapel Language Chamberlain, B. L.; Callahan, D.; Zima, H. P. The International Journal of High Performance Computing Applications, Vol. 21, Issue 3 https://doi.org/10.1177/1094342007078442	journal	August 2007
An Efficient Format for Nearly Constant-Time Access to Arbitrary Time Intervals in Large Trace Files Chan, Anthony; Gropp, William; Lusk, Ewing Scientific Programming, Vol. 16, Issue 2-3 https://doi.org/10.1155/2008/749874	journal	January 2008
Latency, occupancy, and bandwidth in dsm multiprocessors: a performance evaluation Chaudhuri, M.; Heinrich, M.; Holt, C. IEEE Transactions on Computers, Vol. 52, Issue 7 https://doi.org/10.1109/TC.2003.1214336	journal	July 2003
The IBM Blue Gene/Q interconnection network and message unit Chen, Dong; Parker, Jeffrey J.; Eisley, Noel A. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063419	conference	January 2011
Scalability analysis of SPMD codes using expectations Coarfa, Cristian; Mellor-Crummey, John; Froyd, Nathan Proceedings of the 21st annual international conference on Supercomputing - ICS '07 https://doi.org/10.1145/1274971.1274976	conference	January 2007
On the suitability of MPI as a PGAS runtime Daily, Jeff; Vishnu, Abhinav; Palmer, Bruce 2014 21st International Conference on High Performance Computing (HiPC) https://doi.org/10.1109/HiPC.2014.7116712	conference	December 2014
The Future Fast Fourier Transform? Edelman, Alan; McCorquodale, Peter; Toledo, Sivan SIAM Journal on Scientific Computing, Vol. 20, Issue 3 https://doi.org/10.1137/S1064827597316266	journal	January 1998
Cray Cascade: A scalable HPC system based on a Dragonfly network Faanes, Greg; Bataineh, Abdulla; Roweth, Duncan 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.39	conference	November 2012
Modeling the performance of an algebraic multigrid cycle on HPC platforms Gahvari, Hormozd; Baker, Allison H.; Schulz, Martin Proceedings of the international conference on Supercomputing - ICS '11 https://doi.org/10.1145/1995896.1995924	conference	January 2011
Major Computer Science Challenges At Exascale Geist, Al; Lucas, Robert The International Journal of High Performance Computing Applications, Vol. 23, Issue 4 https://doi.org/10.1177/1094342009347445	journal	September 2009
Predictive performance and scalability modeling of a large-scale application Kerbyson, D. J.; Alme, H. J.; Hoisie, A. Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01 https://doi.org/10.1145/582034.582071	conference	January 2001
A new vision for coarray Fortran Mellor-Crummey, John; Adhianto, Laksono; Scherer, William N. Proceedings of the Third Conference on Partitioned Global Address Space Programing Models - PGAS '09 https://doi.org/10.1145/1809961.1809969	conference	January 2009
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit Nieplocha, Jarek; Palmer, Bruce; Tipparaju, Vinod The International Journal of High Performance Computing Applications, Vol. 20, Issue 2 https://doi.org/10.1177/1094342006064503	journal	May 2006
Co-array Fortran for parallel programming Numrich, Robert W.; Reid, John ACM SIGPLAN Fortran Forum, Vol. 17, Issue 2 https://doi.org/10.1145/289918.289920	journal	August 1998
Warp speed: executing time warp on 1,966,080 cores Barnes, Peter D.; Carothers, Christopher D.; Jefferson, David R. Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13 https://doi.org/10.1145/2486092.2486134	conference	January 2013
“Hot spot” contention and combining in multistage interconnection networks Pfister, Gregory F.; Norton, V. Alan IEEE Transactions on Computers, Vol. C-34, Issue 10 https://doi.org/10.1109/TC.1985.6312198	journal	October 1985
The Tau Parallel Performance System Shende, Sameer S.; Malony, Allen D. The International Journal of High Performance Computing Applications, Vol. 20, Issue 2 https://doi.org/10.1177/1094342006064482	journal	May 2006
Improving communication performance in dense linear algebra via topology aware collectives Solomonik, Edgar; Bhatele, Abhinav; Demmel, James Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063487	conference	January 2011
ARGOS, a vectorized general molecular dynamics program Straatsma, T. P.; McCammon, J. A. Journal of Computational Chemistry, Vol. 11, Issue 8 https://doi.org/10.1002/jcc.540110806	journal	September 1990
Statistical scalability analysis of communication operations in distributed applications Vetter, Jeffrey S.; McCracken, Michael O. Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming - PPoPP '01 https://doi.org/10.1145/379539.379590	conference	January 2001
Designing scalable PGAS communication subsystems on cray gemini interconnect Vishnu, Abhinav; Daily, Jeff; Palmer, Bruce 2012 19th International Conference on High Performance Computing (HiPC) https://doi.org/10.1109/HiPC.2012.6507506	conference	December 2012
Building Scalable PGAS Communication Subsystem on Blue Gene/Q Vishnu, Abhinav; Kerbyson, Darren J.; Barker, Kevin 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2013.262	conference	May 2013
Efficient Pattern Search in Large Traces Through Successive Refinement Wolf, Felix; Mohr, Bernd; Dongarra, Jack Lecture Notes in Computer Science https://doi.org/10.1007/978-3-540-27866-5_7	book	January 2004

Similar Records

Efficient Active Message RMA in GASNet Using a Target-Side Reassembly Protocol (Extended Abstract)

Conference · Sun Nov 17 00:00:00 EST 2019 · OSTI ID:1178498

Hargrove, P; Bonachea, Dan

Reaching bandwidth saturation using transparent injection parallelization

Journal Article · Wed Oct 05 00:00:00 EDT 2016 · International Journal of High Performance Computing Applications · OSTI ID:1178498

Chaimov, Nicholas; Ibrahim, Khaled Z.; Williams, Samuel; +1 more

Reaching bandwidth saturation using transparent injection parallelization

Journal Article · Wed Nov 09 00:00:00 EST 2016 · International Journal of High Performance Computing Applications · OSTI ID:1178498

Chaimov, Nicholas; Ibrahim, Khaled Z.; Williams, Samuel; +1 more

Related Subjects

Network Contention
Performance Analysis
Dynamic Modeling

Title: Diagnosing the Causes and Severity of One-sided Message Contention

Citation Formats

References (28)

Similar Records

Related Subjects