Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation.
Abstract not provided.
- Research Organization:
- Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States); Sandia National Laboratories, Livermore, CA
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA), Office of Defense Nuclear Security (NA-70)
- DOE Contract Number:
- NA0003525
- OSTI ID:
- 1891963
- Report Number(s):
- SAND2021-10802C; 700889
- Country of Publication:
- United States
- Language:
- English
APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks
|
conference | May 2017 |
The Case of Performance Variability on Dragonfly-based Systems
|
conference | May 2020 |
The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications
|
conference | November 2014 |
The Outer Rim Simulation: A Path to Many-core Supercomputers
|
journal | November 2019 |
Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters
|
conference | May 2014 |
GPCNeT: designing a benchmark suite for inducing and measuring contention in HPC networks
|
conference | November 2019 |
Lpms
|
conference | August 2019 |
Load Balancing in a Cluster Computer
|
conference | January 2006 |
Quiet Neighborhoods: Key to Protect Job Performance Predictability
|
conference | May 2015 |
Technology-Driven, Highly-Scalable Dragonfly Topology
|
conference | June 2008 |
Predicting application performance using supervised learning on communication features
|
conference | January 2013 |
Choreo
|
conference | October 2013 |
Fast Parallel Algorithms for Short-Range Molecular Dynamics
|
journal | March 1995 |
Evaluation of an Interference-free Node Allocation Policy on Fat-tree Clusters
|
conference | November 2018 |
Quantifying the impact of network congestion on application performance and network metrics
|
conference | September 2020 |
Level-Spread: A New Job Allocation Policy for Dragonfly Networks
|
conference | May 2018 |
Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems
|
conference | May 2017 |
Improving inter-node communications in multi-core clusters using a contention-free process mapping algorithm
|
journal | April 2013 |
Maximizing Throughput on a Dragonfly Network
|
conference | November 2014 |
There goes the neighborhood: performance degradation due to nearby jobs
|
conference | January 2013 |
Holistic Measurement-Driven System Assessment
|
conference | September 2017 |
A new metric for ranking high-performance computing systems
|
journal | January 2016 |
QMCPACK : an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids
|
journal | April 2018 |
Integrating Low-latency Analysis into HPC System Monitoring
|
conference | August 2018 |
Diagnosing Performance Variations in HPC Applications Using Machine Learning
|
book | January 2017 |
Network-Aware Scheduling for Data-Parallel Jobs
|
journal | August 2015 |
Run-to-run variability on Xeon Phi based cray XC systems
|
conference | November 2017 |
Similar Records
Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation.
Improving Power and Performance in HPC Networks.
PANN: Power Allocation via Neural Networks - Dynamic Bounded-Power Allocation in High Performance Computing
Conference
·
Wed Sep 01 00:00:00 EDT 2021
·
OSTI ID:1888952
Improving Power and Performance in HPC Networks.
Conference
·
Fri Jul 01 00:00:00 EDT 2016
·
OSTI ID:1371618
PANN: Power Allocation via Neural Networks - Dynamic Bounded-Power Allocation in High Performance Computing
Conference
·
Fri Oct 06 00:00:00 EDT 2017
·
OSTI ID:1409935