SMALE: Enhancing Scalability of Machine Learning Algorithms on Extreme-Scale Computing Platforms
- Duke Univ., Durham, NC (United States); Duke University
- Duke Univ., Durham, NC (United States)
Deployment and execution of machine learning tasks on extreme-scale computing platforms face several significant technical challenges: 1) High computing cost incurred by dense networks – The computing workload of deep networks with densely-connected topology increases rapidly with the network size, imposing a non-scalable computing model of extreme-scale computing platforms; 2) Non-optimized workload distribution – Many advanced deep learning algorithms, e.g., sparsification and irregular net-work topology, produce very unbalanced workload distribution on extreme-scale computing platforms. The computation efficiency is greatly hindered by the incurred data and computation redundancies as well as long tails of the node with extensive workload; 3) Constraints in data movement and I/O bottle-neck – Inter-node data movement in extreme-scale computing platforms are associated with high energy and latency costs, and subject to the constraints of I/O bandwidth; and 4) Generalization of algorithm realization and acceleration on computing platforms – The large varieties of machine learning algorithms and structures of extreme-scale computing platforms make the derivation of a generalized algorithm realization and acceleration method very challenging, which, however, is the requirement by domain scientists and interested users. We call the above challenges Smale’s Problems in Machine Learning and Understanding for High-Performance Computing Scientific Discovery. The objective of our three-year research project is to develop a holistic innovation set at structure, assembly, and acceleration layers of machine learning algorithms to address the above challenges in algorithm deployment and execution. Three tasks are particularly performed, including: At the algorithm structure level, we investigate the techniques that can structurally sparsify on the topology of deep networks for computing workload reduction. We also study clustering and pruning techniques that can optimize the workload distributions over the extreme-scale computing platforms; At the algorithm assembly level, we derive a unified learning framework for unsupervised transfer learning and dynamic growing capabilities. Novel training methods are also exploited to enhance the training efficiency of the proposed framework; At the algorithm acceleration level, we will develop a series of techniques that can accelerate the computation of sparse matrix operations, which are one of the core executions in deep learning and optimize memory access of the concerned platforms. Our proposed techniques attack the fundamental problems in machine learning algorithms running on extreme-scale computing platforms by vertically integrating the solutions at three closely entangled layers, paving the long-term scaling path of machine learning applications under DOE context. Three tasks corresponding to the above respective research orientations are performed during the three-year project period with our collaborators at ORNL. The outcome of the proposed project is anticipated to form a holistic solution set of novel algorithms and network topologies, efficient training techniques, and fast acceleration methods to promote the computing scalability of the machine learning applications of particular interest to DOE.
- Research Organization:
- Duke Univ., Durham, NC (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- DOE Contract Number:
- SC0018064
- OSTI ID:
- 1846568
- Report Number(s):
- DOE-Duke-18064-1
- Country of Publication:
- United States
- Language:
- English
GraphR: Accelerating Graph Processing Using ReRAM
|
conference | February 2018 |
ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM
|
conference | June 2019 |
ReRAM-based accelerator for deep learning
|
conference | March 2018 |
AutoGrow: Automatic Layer Growing in Deep Convolutional Networks
|
conference | August 2020 |
Exploring Applications of STT-RAM in GPU Architectures
|
journal | January 2021 |
ReGAN: A pipelined ReRAM-based accelerator for generative adversarial networks
|
conference | January 2018 |
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
|
conference | February 2019 |
MobiEye: An Efficient Cloud-based Video Detection System for Real-time Mobile Applications
|
conference | June 2019 |
NeuralHMC: an efficient HMC-based accelerator for deep neural networks
|
conference | January 2019 |
Similar Records
Scalable deep text comprehension for Cancer surveillance on high-performance computing
Analyzing inference workloads for spatiotemporal modeling
Conference
·
Fri Nov 30 23:00:00 EST 2018
·
OSTI ID:1491345
Analyzing inference workloads for spatiotemporal modeling
Journal Article
·
Mon Sep 16 20:00:00 EDT 2024
· Future Generations Computer Systems
·
OSTI ID:2513464