SMALE: Enhancing Scalability of Machine Learning Algorithms on Extreme-Scale Computing Platforms

Chen, Yiran; Li, Hai (Helen)

doi:10.2172/1846568

Title: SMALE: Enhancing Scalability of Machine Learning Algorithms on Extreme-Scale Computing Platforms

Technical Report · Sat Feb 26 00:00:00 EST 2022

DOI:https://doi.org/10.2172/1846568· OSTI ID:1846568

^[1];

^[1]

Duke Univ., Durham, NC (United States)

Deployment and execution of machine learning tasks on extreme-scale computing platforms face several significant technical challenges: 1) High computing cost incurred by dense networks – The computing workload of deep networks with densely-connected topology increases rapidly with the network size, imposing a non-scalable computing model of extreme-scale computing platforms; 2) Non-optimized workload distribution – Many advanced deep learning algorithms, e.g., sparsification and irregular net-work topology, produce very unbalanced workload distribution on extreme-scale computing platforms. The computation efficiency is greatly hindered by the incurred data and computation redundancies as well as long tails of the node with extensive workload; 3) Constraints in data movement and I/O bottle-neck – Inter-node data movement in extreme-scale computing platforms are associated with high energy and latency costs, and subject to the constraints of I/O bandwidth; and 4) Generalization of algorithm realization and acceleration on computing platforms – The large varieties of machine learning algorithms and structures of extreme-scale computing platforms make the derivation of a generalized algorithm realization and acceleration method very challenging, which, however, is the requirement by domain scientists and interested users. We call the above challenges Smale’s Problems in Machine Learning and Understanding for High-Performance Computing Scientific Discovery. The objective of our three-year research project is to develop a holistic innovation set at structure, assembly, and acceleration layers of machine learning algorithms to address the above challenges in algorithm deployment and execution. Three tasks are particularly performed, including: At the algorithm structure level, we investigate the techniques that can structurally sparsify on the topology of deep networks for computing workload reduction. We also study clustering and pruning techniques that can optimize the workload distributions over the extreme-scale computing platforms; At the algorithm assembly level, we derive a unified learning framework for unsupervised transfer learning and dynamic growing capabilities. Novel training methods are also exploited to enhance the training efficiency of the proposed framework; At the algorithm acceleration level, we will develop a series of techniques that can accelerate the computation of sparse matrix operations, which are one of the core executions in deep learning and optimize memory access of the concerned platforms. Our proposed techniques attack the fundamental problems in machine learning algorithms running on extreme-scale computing platforms by vertically integrating the solutions at three closely entangled layers, paving the long-term scaling path of machine learning applications under DOE context. Three tasks corresponding to the above respective research orientations are performed during the three-year project period with our collaborators at ORNL. The outcome of the proposed project is anticipated to form a holistic solution set of novel algorithms and network topologies, efficient training techniques, and fast acceleration methods to promote the computing scalability of the machine learning applications of particular interest to DOE.

View Technical Report

Cite

Export

Save

Research Organization:: Duke Univ., Durham, NC (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

DOE Contract Number:: SC0018064

OSTI ID:: 1846568

Report Number(s):: DOE-Duke-18064-1

Resource Relation:: Related Information: "1. F. Chen, L. Song, and Y. Chen, “ReGAN: A Pipelined ReRAM-Based Accelerator for Generative Adversarial Networks,” Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2018, pp. 178-183. DOI: 10.1109/ASPDAC.2018.8297302.2. L. Song, Y. Zhuo, X. Qian, H. Li, and Y. Chen, “GraphR: Accelerating Graph Processing Using ReRAM,” International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2018, pp. 531-543. DOI: 10.1109/HPCA.2018.00052.3. B. Li, L. Song, F. Chen, X. Qian, Y. Chen, and H. Li, “ReRAM-based Accelerator for Deep Learning,” Design, Automation & Test in Europe (DATE), Mar. 2018, pp. 815-820. DOI: 10.23919/DATE.2018.8342118.4. C. Min, J. Mao, H. Li, and Y. Chen, “NeuralHMC: An Efficient HMC-based Accelerator for Deep Neural Networks,” International Conference on Computer Aided Design (ASP-DAC), Jan. 2019, pp. 394-399. DOI: 10.1145/3287624.3287642.5. F. Chen, L. Song, H. Li, and Y. Chen, “ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM,” Design Automation Conference (DAC), Jun. 2019, Article no. 133. DOI: 10.1145/3316781.3317936.6. J. Mao, Q. Yang, Ang. Li, H. Li, and Y. Chen, “MobiEye: An Efficient Cloud-based Video Detection System for Real-time Mobile Applications,” Design Automation Conference (DAC), Jun. 2019, Article no. 102. DOI: 10.1145/3316781.3317865.7. L. Song, J. Mao, Y. Zhuo, X. Qian, H. Li, and Y. Chen, “HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array,” International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2019, pp. 56-68. DOI: 10.1109/HPCA.2019.00027.8. W. Wen, F. Yan, Y. Chen, and H. Li, “AutoGrow: Automatic Layer Growing in Deep Convolutional Networks,” ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Aug. 2020, pp. 833-841. DOI: 10.1145/3394486.3403126.9. X. Liu, M. Mao, X. Bi, H. Li, and Y. Chen, “Exploring Applications of STT-RAM-based in GPU Architectures,” IEEE Transactions on Circuits and Systems I (TCAS-I), vol. 68, no. 1, Jan. 2021, pp. 238-249. DOI: 10.1109/TCSI.2020.3031895."

Country of Publication:: United States

Language:: English

References (9)

AutoGrow: Automatic Layer Growing in Deep Convolutional Networks Wen, Wei; Yan, Feng; Chen, Yiran KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining https://doi.org/10.1145/3394486.3403126	conference	August 2020
NeuralHMC: an efficient HMC-based accelerator for deep neural networks Min, Chuhan; Mao, Jiachen; Li, Hai ASPDAC '19: 24th Asia and South Pacific Design Automation Conference, Proceedings of the 24th Asia and South Pacific Design Automation Conference https://doi.org/10.1145/3287624.3287642	conference	January 2019
Exploring Applications of STT-RAM in GPU Architectures Liu, Xiaoxiao; Mao, Mengjie; Bi, Xiuyuan IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 68, Issue 1 https://doi.org/10.1109/TCSI.2020.3031895	journal	January 2021
ReGAN: A pipelined ReRAM-based accelerator for generative adversarial networks Chen, Fan; Song, Linghao; Chen, Yiran 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC) https://doi.org/10.1109/ASPDAC.2018.8297302	conference	January 2018
ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM Chen, Fan; Song, Linghao; Li, Hai Helen DAC '19: The 56th Annual Design Automation Conference 2019, Proceedings of the 56th Annual Design Automation Conference 2019 https://doi.org/10.1145/3316781.3317936	conference	June 2019
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array Song, Linghao; Mao, Jiachen; Zhuo, Youwei 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) https://doi.org/10.1109/HPCA.2019.00027	conference	February 2019
GraphR: Accelerating Graph Processing Using ReRAM Song, Linghao; Zhuo, Youwei; Qian, Xuehai 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) https://doi.org/10.1109/HPCA.2018.00052	conference	February 2018
MobiEye: An Efficient Cloud-based Video Detection System for Real-time Mobile Applications Mao, Jiachen; Yang, Qing; Li, Ang DAC '19: The 56th Annual Design Automation Conference 2019, Proceedings of the 56th Annual Design Automation Conference 2019 https://doi.org/10.1145/3316781.3317865	conference	June 2019
ReRAM-based accelerator for deep learning Li, Bing; Song, Linghao; Chen, Fan 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE) https://doi.org/10.23919/DATE.2018.8342118	conference	March 2018

Similar Records

Toward Large-Scale Image Segmentation on Summit

Conference · Sat Aug 01 00:00:00 EDT 2020 · OSTI ID:1846568

Seal, Sudip; Lim, Seung-Hwan; Wang, Dali; +3 more

Computer Science Research Needs for Parallel Discrete Event Simulation (PDES)

Technical Report · Wed May 11 00:00:00 EDT 2022 · OSTI ID:1846568

Perumalla, Kalyan; Barnes, Peter; Bremer, Maximilian; +11 more

Scalable deep text comprehension for Cancer surveillance on high-performance computing

Conference · Sat Dec 01 00:00:00 EST 2018 · OSTI ID:1846568

Qiu, John X.; Yoon, Hong-Jun; Srivastava, Kshitij; +6 more

Related Subjects

97 MATHEMATICS AND COMPUTING
Machine Learning
Efficiency
Extreme-scale

Title: SMALE: Enhancing Scalability of Machine Learning Algorithms on Extreme-Scale Computing Platforms

Citation Formats

References (9)

Similar Records

Related Subjects