skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: SMALE: Enhancing Scalability of Machine Learning Algorithms on Extreme-Scale Computing Platforms

Technical Report ·
DOI:https://doi.org/10.2172/1846568· OSTI ID:1846568

Deployment and execution of machine learning tasks on extreme-scale computing platforms face several significant technical challenges: 1) High computing cost incurred by dense networks – The computing workload of deep networks with densely-connected topology increases rapidly with the network size, imposing a non-scalable computing model of extreme-scale computing platforms; 2) Non-optimized workload distribution – Many advanced deep learning algorithms, e.g., sparsification and irregular net-work topology, produce very unbalanced workload distribution on extreme-scale computing platforms. The computation efficiency is greatly hindered by the incurred data and computation redundancies as well as long tails of the node with extensive workload; 3) Constraints in data movement and I/O bottle-neck – Inter-node data movement in extreme-scale computing platforms are associated with high energy and latency costs, and subject to the constraints of I/O bandwidth; and 4) Generalization of algorithm realization and acceleration on computing platforms – The large varieties of machine learning algorithms and structures of extreme-scale computing platforms make the derivation of a generalized algorithm realization and acceleration method very challenging, which, however, is the requirement by domain scientists and interested users. We call the above challenges Smale’s Problems in Machine Learning and Understanding for High-Performance Computing Scientific Discovery. The objective of our three-year research project is to develop a holistic innovation set at structure, assembly, and acceleration layers of machine learning algorithms to address the above challenges in algorithm deployment and execution. Three tasks are particularly performed, including: At the algorithm structure level, we investigate the techniques that can structurally sparsify on the topology of deep networks for computing workload reduction. We also study clustering and pruning techniques that can optimize the workload distributions over the extreme-scale computing platforms; At the algorithm assembly level, we derive a unified learning framework for unsupervised transfer learning and dynamic growing capabilities. Novel training methods are also exploited to enhance the training efficiency of the proposed framework; At the algorithm acceleration level, we will develop a series of techniques that can accelerate the computation of sparse matrix operations, which are one of the core executions in deep learning and optimize memory access of the concerned platforms. Our proposed techniques attack the fundamental problems in machine learning algorithms running on extreme-scale computing platforms by vertically integrating the solutions at three closely entangled layers, paving the long-term scaling path of machine learning applications under DOE context. Three tasks corresponding to the above respective research orientations are performed during the three-year project period with our collaborators at ORNL. The outcome of the proposed project is anticipated to form a holistic solution set of novel algorithms and network topologies, efficient training techniques, and fast acceleration methods to promote the computing scalability of the machine learning applications of particular interest to DOE.

Research Organization:
Duke Univ., Durham, NC (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
SC0018064
OSTI ID:
1846568
Report Number(s):
DOE-Duke-18064-1
Resource Relation:
Related Information: "1. F. Chen, L. Song, and Y. Chen, “ReGAN: A Pipelined ReRAM-Based Accelerator for Generative Adversarial Networks,” Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2018, pp. 178-183. DOI: 10.1109/ASPDAC.2018.8297302.2. L. Song, Y. Zhuo, X. Qian, H. Li, and Y. Chen, “GraphR: Accelerating Graph Processing Using ReRAM,” International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2018, pp. 531-543. DOI: 10.1109/HPCA.2018.00052.3. B. Li, L. Song, F. Chen, X. Qian, Y. Chen, and H. Li, “ReRAM-based Accelerator for Deep Learning,” Design, Automation & Test in Europe (DATE), Mar. 2018, pp. 815-820. DOI: 10.23919/DATE.2018.8342118.4. C. Min, J. Mao, H. Li, and Y. Chen, “NeuralHMC: An Efficient HMC-based Accelerator for Deep Neural Networks,” International Conference on Computer Aided Design (ASP-DAC), Jan. 2019, pp. 394-399. DOI: 10.1145/3287624.3287642.5. F. Chen, L. Song, H. Li, and Y. Chen, “ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM,” Design Automation Conference (DAC), Jun. 2019, Article no. 133. DOI: 10.1145/3316781.3317936.6. J. Mao, Q. Yang, Ang. Li, H. Li, and Y. Chen, “MobiEye: An Efficient Cloud-based Video Detection System for Real-time Mobile Applications,” Design Automation Conference (DAC), Jun. 2019, Article no. 102. DOI: 10.1145/3316781.3317865.7. L. Song, J. Mao, Y. Zhuo, X. Qian, H. Li, and Y. Chen, “HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array,” International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2019, pp. 56-68. DOI: 10.1109/HPCA.2019.00027.8. W. Wen, F. Yan, Y. Chen, and H. Li, “AutoGrow: Automatic Layer Growing in Deep Convolutional Networks,” ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Aug. 2020, pp. 833-841. DOI: 10.1145/3394486.3403126.9. X. Liu, M. Mao, X. Bi, H. Li, and Y. Chen, “Exploring Applications of STT-RAM-based in GPU Architectures,” IEEE Transactions on Circuits and Systems I (TCAS-I), vol. 68, no. 1, Jan. 2021, pp. 238-249. DOI: 10.1109/TCSI.2020.3031895."
Country of Publication:
United States
Language:
English

References (9)

AutoGrow: Automatic Layer Growing in Deep Convolutional Networks
  • Wen, Wei; Yan, Feng; Chen, Yiran
  • KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining https://doi.org/10.1145/3394486.3403126
conference August 2020
NeuralHMC: an efficient HMC-based accelerator for deep neural networks
  • Min, Chuhan; Mao, Jiachen; Li, Hai
  • ASPDAC '19: 24th Asia and South Pacific Design Automation Conference, Proceedings of the 24th Asia and South Pacific Design Automation Conference https://doi.org/10.1145/3287624.3287642
conference January 2019
Exploring Applications of STT-RAM in GPU Architectures journal January 2021
ReGAN: A pipelined ReRAM-based accelerator for generative adversarial networks conference January 2018
ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM
  • Chen, Fan; Song, Linghao; Li, Hai Helen
  • DAC '19: The 56th Annual Design Automation Conference 2019, Proceedings of the 56th Annual Design Automation Conference 2019 https://doi.org/10.1145/3316781.3317936
conference June 2019
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array conference February 2019
GraphR: Accelerating Graph Processing Using ReRAM conference February 2018
MobiEye: An Efficient Cloud-based Video Detection System for Real-time Mobile Applications
  • Mao, Jiachen; Yang, Qing; Li, Ang
  • DAC '19: The 56th Annual Design Automation Conference 2019, Proceedings of the 56th Annual Design Automation Conference 2019 https://doi.org/10.1145/3316781.3317865
conference June 2019
ReRAM-based accelerator for deep learning conference March 2018