Evaluation of pre-training large language models on leadership-class supercomputers
Journal Article
·
· Journal of Supercomputing
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Large language models (LLMs) have arisen rapidly to the center stage of artificial intelligence as the foundation models applicable to many downstream learning tasks. However, how to effectively build, train, and serve such models for many high-stake and first-principle-based scientific use cases are both of great interests and of great challenges. Moreover, pre-training LLMs with billions or even trillions of parameters can be prohibitively expensive not just for academic institutions, but also for well-funded industrial and government labs. Furthermore, the energy cost and the environmental impact of developing LLMs must be kept in mind. Here, in this work, we conduct a first-of-its-kind performance analysis to understand the time and energy cost of pre-training LLMs on the Department of Energy (DOE)’s leadership-class supercomputers. Employing state-of-the-art distributed training techniques, we evaluate the computational performance of various parallelization approaches at scale for a range of model sizes, and establish a projection model for the cost of full training. Our findings provide baseline results, best practices, and heuristics for pre-training such large models that should be valuable to HPC community at large. We also offer insights and optimization strategies for using the first exascale computing system, Frontier, to train models of the size of GPT-3 and beyond.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF)
- Grant/Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1994640
- Journal Information:
- Journal of Supercomputing, Journal Name: Journal of Supercomputing Journal Issue: 18 Vol. 79; ISSN 0920-8542
- Publisher:
- SpringerCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Comparative Study of Large Language Model Architectures on Frontier
Optimizing Distributed Training on Frontier for Large Language Models
Revealing power, energy and thermal dynamics of a 200PF pre-exascale supercomputer
Conference
·
Wed May 01 00:00:00 EDT 2024
·
OSTI ID:2406796
Optimizing Distributed Training on Frontier for Large Language Models
Conference
·
Wed May 01 00:00:00 EDT 2024
·
OSTI ID:2438819
Revealing power, energy and thermal dynamics of a 200PF pre-exascale supercomputer
Conference
·
Mon Nov 01 00:00:00 EDT 2021
·
OSTI ID:1833956