SMALE: Enhancing Scalability of Machine Learning Algorithms on Extreme-Scale Computing Platforms
- Duke Univ., Durham, NC (United States)
Deployment and execution of machine learning tasks on extreme-scale computing platforms face several significant technical challenges: 1) High computing cost incurred by dense networks – The computing workload of deep networks with densely-connected topology increases rapidly with the network size, imposing a non-scalable computing model of extreme-scale computing platforms; 2) Non-optimized workload distribution – Many advanced deep learning algorithms, e.g., sparsification and irregular net-work topology, produce very unbalanced workload distribution on extreme-scale computing platforms. The computation efficiency is greatly hindered by the incurred data and computation redundancies as well as long tails of the node with extensive workload; 3) Constraints in data movement and I/O bottle-neck – Inter-node data movement in extreme-scale computing platforms are associated with high energy and latency costs, and subject to the constraints of I/O bandwidth; and 4) Generalization of algorithm realization and acceleration on computing platforms – The large varieties of machine learning algorithms and structures of extreme-scale computing platforms make the derivation of a generalized algorithm realization and acceleration method very challenging, which, however, is the requirement by domain scientists and interested users. We call the above challenges Smale’s Problems in Machine Learning and Understanding for High-Performance Computing Scientific Discovery. The objective of our three-year research project is to develop a holistic innovation set at structure, assembly, and acceleration layers of machine learning algorithms to address the above challenges in algorithm deployment and execution. Three tasks are particularly performed, including: At the algorithm structure level, we investigate the techniques that can structurally sparsify on the topology of deep networks for computing workload reduction. We also study clustering and pruning techniques that can optimize the workload distributions over the extreme-scale computing platforms; At the algorithm assembly level, we derive a unified learning framework for unsupervised transfer learning and dynamic growing capabilities. Novel training methods are also exploited to enhance the training efficiency of the proposed framework; At the algorithm acceleration level, we will develop a series of techniques that can accelerate the computation of sparse matrix operations, which are one of the core executions in deep learning and optimize memory access of the concerned platforms. Our proposed techniques attack the fundamental problems in machine learning algorithms running on extreme-scale computing platforms by vertically integrating the solutions at three closely entangled layers, paving the long-term scaling path of machine learning applications under DOE context. Three tasks corresponding to the above respective research orientations are performed during the three-year project period with our collaborators at ORNL. The outcome of the proposed project is anticipated to form a holistic solution set of novel algorithms and network topologies, efficient training techniques, and fast acceleration methods to promote the computing scalability of the machine learning applications of particular interest to DOE.
- Research Organization:
- Duke Univ., Durham, NC (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- DOE Contract Number:
- SC0018064
- OSTI ID:
- 1846568
- Report Number(s):
- DOE-Duke-18064-1
- Resource Relation:
- Related Information: "1. F. Chen, L. Song, and Y. Chen, “ReGAN: A Pipelined ReRAM-Based Accelerator for Generative Adversarial Networks,” Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2018, pp. 178-183. DOI: 10.1109/ASPDAC.2018.8297302.2. L. Song, Y. Zhuo, X. Qian, H. Li, and Y. Chen, “GraphR: Accelerating Graph Processing Using ReRAM,” International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2018, pp. 531-543. DOI: 10.1109/HPCA.2018.00052.3. B. Li, L. Song, F. Chen, X. Qian, Y. Chen, and H. Li, “ReRAM-based Accelerator for Deep Learning,” Design, Automation & Test in Europe (DATE), Mar. 2018, pp. 815-820. DOI: 10.23919/DATE.2018.8342118.4. C. Min, J. Mao, H. Li, and Y. Chen, “NeuralHMC: An Efficient HMC-based Accelerator for Deep Neural Networks,” International Conference on Computer Aided Design (ASP-DAC), Jan. 2019, pp. 394-399. DOI: 10.1145/3287624.3287642.5. F. Chen, L. Song, H. Li, and Y. Chen, “ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM,” Design Automation Conference (DAC), Jun. 2019, Article no. 133. DOI: 10.1145/3316781.3317936.6. J. Mao, Q. Yang, Ang. Li, H. Li, and Y. Chen, “MobiEye: An Efficient Cloud-based Video Detection System for Real-time Mobile Applications,” Design Automation Conference (DAC), Jun. 2019, Article no. 102. DOI: 10.1145/3316781.3317865.7. L. Song, J. Mao, Y. Zhuo, X. Qian, H. Li, and Y. Chen, “HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array,” International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2019, pp. 56-68. DOI: 10.1109/HPCA.2019.00027.8. W. Wen, F. Yan, Y. Chen, and H. Li, “AutoGrow: Automatic Layer Growing in Deep Convolutional Networks,” ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Aug. 2020, pp. 833-841. DOI: 10.1145/3394486.3403126.9. X. Liu, M. Mao, X. Bi, H. Li, and Y. Chen, “Exploring Applications of STT-RAM-based in GPU Architectures,” IEEE Transactions on Circuits and Systems I (TCAS-I), vol. 68, no. 1, Jan. 2021, pp. 238-249. DOI: 10.1109/TCSI.2020.3031895."
- Country of Publication:
- United States
- Language:
- English
AutoGrow: Automatic Layer Growing in Deep Convolutional Networks
|
conference | August 2020 |
NeuralHMC: an efficient HMC-based accelerator for deep neural networks
|
conference | January 2019 |
Exploring Applications of STT-RAM in GPU Architectures
|
journal | January 2021 |
ReGAN: A pipelined ReRAM-based accelerator for generative adversarial networks
|
conference | January 2018 |
ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM
|
conference | June 2019 |
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
|
conference | February 2019 |
GraphR: Accelerating Graph Processing Using ReRAM
|
conference | February 2018 |
MobiEye: An Efficient Cloud-based Video Detection System for Real-time Mobile Applications
|
conference | June 2019 |
ReRAM-based accelerator for deep learning
|
conference | March 2018 |
Similar Records
Computer Science Research Needs for Parallel Discrete Event Simulation (PDES)
Scalable deep text comprehension for Cancer surveillance on high-performance computing