Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation.

Conference ·
Abstract not provided.
Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States); Sandia National Laboratories, Livermore, CA
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA), Office of Defense Nuclear Security (NA-70)
DOE Contract Number:
NA0003525
OSTI ID:
1891963
Report Number(s):
SAND2021-10802C; 700889
Country of Publication:
United States
Language:
English

References (27)

APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks conference May 2017
The Case of Performance Variability on Dragonfly-based Systems conference May 2020
The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications
  • Agelastos, Anthony; Allan, Benjamin; Brandt, Jim
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.18
conference November 2014
The Outer Rim Simulation: A Path to Many-core Supercomputers journal November 2019
Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters conference May 2014
GPCNeT: designing a benchmark suite for inducing and measuring contention in HPC networks
  • Chunduri, Sudheer; Groves, Taylor; Mendygral, Peter
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356215
conference November 2019
Lpms conference August 2019
Load Balancing in a Cluster Computer
  • Werstein, Paul; Situ, Hailing; Huang, Zhiyi
  • 2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06) https://doi.org/10.1109/PDCAT.2006.77
conference January 2006
Quiet Neighborhoods: Key to Protect Job Performance Predictability conference May 2015
Technology-Driven, Highly-Scalable Dragonfly Topology
  • Kim, John; Dally, Wiliam J.; Scott, Steve
  • 2008 35th International Symposium on Computer Architecture (ISCA), 2008 International Symposium on Computer Architecture https://doi.org/10.1109/ISCA.2008.19
conference June 2008
Predicting application performance using supervised learning on communication features
  • Jain, Nikhil; Bhatele, Abhinav; Robson, Michael P.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503263
conference January 2013
Choreo conference October 2013
Fast Parallel Algorithms for Short-Range Molecular Dynamics journal March 1995
Evaluation of an Interference-free Node Allocation Policy on Fat-tree Clusters
  • Pollard, Samuel D.; Jain, Nikhil; Herbein, Stephen
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00029
conference November 2018
Quantifying the impact of network congestion on application performance and network metrics conference September 2020
Level-Spread: A New Job Allocation Policy for Dragonfly Networks conference May 2018
Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems conference May 2017
Improving inter-node communications in multi-core clusters using a contention-free process mapping algorithm journal April 2013
Maximizing Throughput on a Dragonfly Network
  • Jain, Nikhil; Bhatele, Abhinav; Ni, Xiang
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.33
conference November 2014
There goes the neighborhood: performance degradation due to nearby jobs
  • Bhatele, Abhinav; Mohror, Kathryn; Langer, Steven H.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503247
conference January 2013
Holistic Measurement-Driven System Assessment conference September 2017
A new metric for ranking high-performance computing systems journal January 2016
QMCPACK : an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids journal April 2018
Integrating Low-latency Analysis into HPC System Monitoring
  • Izadpanah, Ramin; Naksinehaboon, Nichamon; Brandt, Jim
  • ICPP 2018: 47th International Conference on Parallel Processing, Proceedings of the 47th International Conference on Parallel Processing https://doi.org/10.1145/3225058.3225086
conference August 2018
Diagnosing Performance Variations in HPC Applications Using Machine Learning book January 2017
Network-Aware Scheduling for Data-Parallel Jobs journal August 2015
Run-to-run variability on Xeon Phi based cray XC systems
  • Chunduri, Sudheer; Harms, Kevin; Parker, Scott
  • SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3126908.3126926
conference November 2017

Similar Records

Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation.
Conference · Wed Sep 01 00:00:00 EDT 2021 · OSTI ID:1888952

Improving Power and Performance in HPC Networks.
Conference · Fri Jul 01 00:00:00 EDT 2016 · OSTI ID:1371618

PANN: Power Allocation via Neural Networks - Dynamic Bounded-Power Allocation in High Performance Computing
Conference · Fri Oct 06 00:00:00 EDT 2017 · OSTI ID:1409935

Related Subjects