Analyzing inference workloads for spatiotemporal modeling
Journal Article
·
· Future Generations Computer Systems
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Ensuring power grid resiliency, forecasting climate conditions, and optimization of transportation infrastructure are some of the many application areas where data is collected in both space and time. Spatiotemporal modeling is about modeling those patterns for forecasting future trends and carrying out critical decision-making by leveraging machine learning/deep learning. Once trained offline, field deployment of trained models for near real-time inference could be challenging because performance can vary significantly depending on the environment, available compute resources and tolerance to ambiguity in results. Users deploying spatiotemporal models for solving complex problems can benefit from analytical studies considering a plethora of system adaptations to understand the associated performance-quality trade-offs. To facilitate the co-design of next-generation hardware architectures for field deployment of trained models, it is critical to characterize the workloads of these deep learning (DL) applications during inference and assess their computational patterns at different levels of the execution stack. In this paper, we develop several variants of deep learning applications that use spatiotemporal data from dynamical systems. We study the associated computational patterns for inference workloads at different levels, considering relevant models (Long short-term Memory, Convolutional Neural Network and Spatio-Temporal Graph Convolution Network), DL frameworks (Tensorflow and PyTorch), precision (FP16, FP32, AMP, INT16 and INT8), inference runtime (ONNX and AI Template), post-training quantization (TensorRT) and platforms (Nvidia DGX A100 and Sambanova SN10 RDU). Overall, our findings indicate that although there is potential in mixed-precision models and post-training quantization for spatiotemporal modeling, extracting efficiency from contemporary GPU systems might be challenging. Instead, co-designing custom accelerators by leveraging optimized High Level Synthesis frameworks (such as SODA High-Level Synthesizer for customized FPGA/ASIC targets) can make workload-specific adjustments to enhance the efficiency.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- Grant/Contract Number:
- AC05-76RL01830
- OSTI ID:
- 2513464
- Report Number(s):
- PNNL-SA-187612
- Journal Information:
- Future Generations Computer Systems, Journal Name: Future Generations Computer Systems Vol. 163; ISSN 0167-739X
- Publisher:
- ElsevierCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Analyzing Deep Learning Model Inferences for Image Classification using OpenVINO
Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Conference
·
Tue Dec 31 23:00:00 EST 2019
·
OSTI ID:1804060
Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Conference
·
Mon Jul 03 00:00:00 EDT 2017
·
OSTI ID:1373860
Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Journal Article
·
Fri May 04 20:00:00 EDT 2018
· Future Generations Computer Systems
·
OSTI ID:1617450