Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation.
Abstract not provided.
- Research Organization:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA), Office of Defense Nuclear Security
- DOE Contract Number:
- NA0003525
- OSTI ID:
- 1891963
- Report Number(s):
- SAND2021-10802C; 700889
- Resource Relation:
- Conference: Proposed for presentation at the IEEE HPEC held September 20-24, 2021 in ,
- Country of Publication:
- United States
- Language:
- English
GPCNeT: designing a benchmark suite for inducing and measuring contention in HPC networks
|
conference | November 2019 |
QMCPACK : an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids
|
journal | April 2018 |
Predicting application performance using supervised learning on communication features
|
conference | January 2013 |
Lpms
|
conference | August 2019 |
Evaluation of an Interference-free Node Allocation Policy on Fat-tree Clusters
|
conference | November 2018 |
Level-Spread: A New Job Allocation Policy for Dragonfly Networks
|
conference | May 2018 |
Choreo
|
conference | October 2013 |
Network-Aware Scheduling for Data-Parallel Jobs
|
journal | August 2015 |
The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications
|
conference | November 2014 |
Holistic Measurement-Driven System Assessment
|
conference | September 2017 |
Integrating Low-latency Analysis into HPC System Monitoring
|
conference | August 2018 |
Diagnosing Performance Variations in HPC Applications Using Machine Learning
|
book | January 2017 |
Quantifying the impact of network congestion on application performance and network metrics
|
conference | September 2020 |
Fast Parallel Algorithms for Short-Range Molecular Dynamics
|
journal | March 1995 |
Run-to-run variability on Xeon Phi based cray XC systems
|
conference | November 2017 |
Maximizing Throughput on a Dragonfly Network
|
conference | November 2014 |
Improving inter-node communications in multi-core clusters using a contention-free process mapping algorithm
|
journal | April 2013 |
Quiet Neighborhoods: Key to Protect Job Performance Predictability
|
conference | May 2015 |
The Case of Performance Variability on Dragonfly-based Systems
|
conference | May 2020 |
APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks
|
conference | May 2017 |
There goes the neighborhood: performance degradation due to nearby jobs
|
conference | January 2013 |
Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters
|
conference | May 2014 |
Load Balancing in a Cluster Computer
|
conference | January 2006 |
Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems
|
conference | May 2017 |
Technology-Driven, Highly-Scalable Dragonfly Topology
|
conference | June 2008 |
A new metric for ranking high-performance computing systems
|
journal | January 2016 |
The Outer Rim Simulation: A Path to Many-core Supercomputers
|
journal | November 2019 |
Similar Records
Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation.
Improving Power and Performance in HPC Networks.
A data-driven turbulence modeling framework for the Reynolds-averaged Navier-Stokes equations via discrepancy-based tensor-basis neural networks .
Conference
·
2021
·
OSTI ID:1888952
+5 more
Improving Power and Performance in HPC Networks.
Conference
·
2016
·
OSTI ID:1371618
A data-driven turbulence modeling framework for the Reynolds-averaged Navier-Stokes equations via discrepancy-based tensor-basis neural networks .
Conference
·
2022
·
OSTI ID:2004227
+2 more