DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing High-Throughput Inference on Graph Neural Networks at Shared Computing Facilities with the NVIDIA Triton Inference Server

Journal Article · · Computing and Software for Big Science

Abstract With machine learning applications now spanning a variety of computational tasks, multi-user shared computing facilities are devoting a rapidly increasing proportion of their resources to such algorithms. Graph neural networks (GNNs), for example, have provided astounding improvements in extracting complex signatures from data and are now widely used in a variety of applications, such as particle jet classification in high energy physics (HEP). However, GNNs also come with an enormous computational penalty that requires the use of GPUs to maintain reasonable throughput. At shared computing facilities, such as those used by physicists at Fermi National Accelerator Laboratory (Fermilab), methodical resource allocation and high throughput at the many-user scale are key to ensuring that resources are being used as efficiently as possible. These facilities, however, primarily provide CPU-only nodes, which proves detrimental to time-to-insight and computational throughput for workflows that include machine learning inference. In this work, we describe how a shared computing facility can use the NVIDIA Triton Inference Server to optimize its resource allocation and computing structure, recovering high throughput while scaling out to multiple users by massively parallelizing their machine learning inference. To demonstrate the effectiveness of this system in a realistic multi-user environment, we use the Fermilab Elastic Analysis Facility augmented with the Triton Inference Server to provide scalable and high-throughput access to a HEP-specific GNN and report on the outcome.

Sponsoring Organization:
USDOE
Grant/Contract Number:
SC0010005; AC02-07CH11359
OSTI ID:
2404433
Journal Information:
Computing and Software for Big Science, Journal Name: Computing and Software for Big Science Journal Issue: 1 Vol. 8; ISSN 2510-2036
Publisher:
Springer Science + Business MediaCopyright Statement
Country of Publication:
Germany
Language:
English

References (12)

Parallel processing and distributed computing book January 2023
The LHC Physics Center journal March 2008
Queueing analysis of GPU-based inference servers with dynamic batching: A closed-form characterization journal May 2021
Machine Learning in High Energy Physics Community White Paper journal September 2018
The ATLAS Experiment at the CERN Large Hadron Collider journal August 2008
The CMS experiment at the CERN LHC journal August 2008
Jet tagging via particle clouds journal March 2020
Performance Analysis and CPU vs GPU Comparison for Deep Learning conference October 2018
Deep Residual Learning for Image Recognition conference June 2016
FPGAs-as-a-Service Toolkit (FaaST) conference November 2020
XGBoost: A Scalable Tree Boosting System conference January 2016
Deep Learning and Its Application to LHC Physics journal October 2018