Comparative Study of Large Language Model Architectures on Frontier

Yin, Junqi; Bose, Avishek; Cong, Guojing; Lyngaas, Isaac; Anthony, Quentin

doi:10.1109/IPDPS57955.2024.00056

Comparative Study of Large Language Model Architectures on Frontier

Conference · Wed May 01 04:00:00 EDT 2024

DOI:https://doi.org/10.1109/IPDPS57955.2024.00056· OSTI ID:2406796

^[1]; Bose, Avishek ^[1]; Cong, Guojing ^[1]; ^[1]; Anthony, Quentin ^[2]

ORNL
Ohio State University

Large language models (LLMs) have garnered significant attention in both the AI community and beyond. Among these, the Generative Pre-trained Transformer (GPT) has emerged as the dominant architecture, spawning numerous variants. However, these variants have undergone pre-training under diverse conditions, including variations in input data, data preprocessing, and training methodologies, resulting in a lack of controlled comparative studies. Here we meticulously examine two prominent open-sourced GPT architectures, GPT-NeoX and LLaMA, leveraging the computational power of Frontier, the world’s first Exascale supercomputer. Employing the same materials science text corpus and a comprehensive end-to-end pipeline, we conduct a comparative analysis of their training and downstream performance. Our efforts culminate in achieving state-of-the-art performance on a challenging materials science benchmark. Furthermore, we investigate the computation and energy efficiency, and propose a computationally efficient method for architecture design. To our knowledge, these pre-trained models represent the largest available for materials science. Our findings provide practical guidance for building LLMs on HPC platforms.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 2406796

Country of Publication:: United States

Language:: English

Similar Records

Evaluation of pre-training large language models on leadership-class supercomputers

Journal Article · Thu Jun 15 20:00:00 EDT 2023 · Journal of Supercomputing · OSTI ID:1994640

chatHPC: Empowering HPC users with large language models

Journal Article · Wed Nov 20 19:00:00 EST 2024 · Journal of Supercomputing · OSTI ID:2538074

Optimizing Distributed Training on Frontier for Large Language Models

Conference · Wed May 01 00:00:00 EDT 2024 · OSTI ID:2438819

Comparative Study of Large Language Model Architectures on Frontier

Citation Formats

Similar Records

Related Subjects