skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: User-level File Systems Specialized for HPC Workloads Year-End Report

Technical Report ·
DOI:https://doi.org/10.2172/1544466· OSTI ID:1544466
 [1];  [2];  [2]
  1. Florida State Univ., Tallahassee, FL (United States)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

High-performance computing (HPC) clusters attract the attention of Deep Learning (DL) training users due to the clusters' powerful computation capabilities. While there have been many existing efforts to enable deep neural works to leverage the powerful CPU and GPU processors from leadership high-performance computing (HPC) systems, large-scale deep learning with larger datasets requires efficient I/O support from the underlying file and storage systems. As some current and upcoming HPC clusters have large on-node memory or are equipped with NVMe SSD on compute nodes, more distributed DL trainings are considered to leverage those storages to store dataset for efficient dataset access. Our project goal is to design a specialized DL-oriented file system to improve datasets loading performance of any DL training applications. Over the past year, the team at Florida State University has performed research activities in three different aspects, including 1) completing a specialized memory-based I/O framework (DeepIO) for improving dataset loading performance on DL applications; 2) proposing a more generalized file system (DLFS) for DL applications on node-local SSDs; 3) completing the BeeGFS performance evaluation project for Deep Neural Networks.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC52-07NA27344
OSTI ID:
1544466
Report Number(s):
LLNL-SR-764802; 954764
Country of Publication:
United States
Language:
English