Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Scaling SQL to the Supercomputer for Interactive Analysis of Simulation Data

Conference ·

AI and simulation workloads consume and generate large amounts of data that need to be searched, transformed and merged with other data. With the goal of treating data as a first-class citizen inside a traditionally compute-centric HPC environment, we explore how the use of accelerators and high-speed interconnects can speed up tasks which otherwise constitute bottlenecks in computational discovery workflows. BlazingSQL is SQL engine that runs natively on NVIDIA GPUs and supports internode communication for fast analytics on terabyte-scale tabular data sets. We show how a fast interconnect improves query performance if leveraged through the Unified Communication X (UCX) middleware. We envision that future computing platforms will integrate accelerated database query capabilities for immediate and interactive analysis of large simulation data.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1856710
Resource Relation:
Journal Volume: 1512; Conference: Smoky Mountains Computational Sciences and Engineering Conference (SMC) - Kingsport, Tennessee, United States of America - 10/18/2021 12:00:00 PM-10/20/2021 12:00:00 PM
Country of Publication:
United States
Language:
English

References (20)

Accelerating SQL database operations on a GPU with CUDA conference March 2010
Why it is time for a HyPE journal August 2013
Efficient co-processor utilization in database query processing journal November 2013
Introducing OpenSHMEM: SHMEM for the PGAS community conference January 2010
MapReduce: simplified data processing on large clusters journal January 2008
Parallel database systems journal June 1992
Gpuqp conference June 2007
Database compression on graphics processors journal September 2010
High-throughput virtual laboratory for drug discovery using massive datasets journal March 2021
Fast computation of database operations using graphics processors conference June 2004
Relational query coprocessing on graphics processors journal December 2009
Performance Analysis of Big Data ETL Process over CPU-GPU Heterogeneous Architectures conference April 2021
High-Performance Design of Hadoop RPC with RDMA over InfiniBand conference October 2013
SQLPhi conference August 2014
Transitioning from File-Based HPC Workflows to Streaming Data Pipelines with openPMD and ADIOS2
  • Poeschel, Franz; E., Juncheng; Godoy, William F.
  • Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation https://doi.org/10.1007/978-3-030-96498-6_6
book January 2022
UCX: An Open Source Framework for HPC Network APIs and Beyond conference August 2015
Accelerating relational database operations using both CPU and GPU co-processor journal January 2017
pandas-dev/pandas: Pandas software January 2024
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules journal February 1988
Ibex: an intelligent storage engine with support for advanced SQL offloading journal July 2014

Similar Records

Phase 1 Final Report
Technical Report · 2024 · OSTI ID:2439217

A Quantitative Study of Deep Learning Training on Heterogeneous Supercomputers
Conference · 2019 · OSTI ID:1569375

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
Journal Article · 2020 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1598812

Related Subjects