User-level File Systems Specialized for HPC Workloads Year-End Report

Yu, Weikuan; Mohror, K.; Moody, A.

doi:10.2172/1544466

Title: User-level File Systems Specialized for HPC Workloads Year-End Report

Technical Report · Fri Dec 28 00:00:00 EST 2018

DOI:https://doi.org/10.2172/1544466· OSTI ID:1544466

Yu, Weikuan ^[1]; Mohror, K. ^[2]; Moody, A. ^[2]

Florida State Univ., Tallahassee, FL (United States)
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

High-performance computing (HPC) clusters attract the attention of Deep Learning (DL) training users due to the clusters' powerful computation capabilities. While there have been many existing efforts to enable deep neural works to leverage the powerful CPU and GPU processors from leadership high-performance computing (HPC) systems, large-scale deep learning with larger datasets requires efficient I/O support from the underlying file and storage systems. As some current and upcoming HPC clusters have large on-node memory or are equipped with NVMe SSD on compute nodes, more distributed DL trainings are considered to leverage those storages to store dataset for efficient dataset access. Our project goal is to design a specialized DL-oriented file system to improve datasets loading performance of any DL training applications. Over the past year, the team at Florida State University has performed research activities in three different aspects, including 1) completing a specialized memory-based I/O framework (DeepIO) for improving dataset loading performance on DL applications; 2) proposing a more generalized file system (DLFS) for DL applications on node-local SSDs; 3) completing the BeeGFS performance evaluation project for Deep Neural Networks.

View Technical Report

Cite

Export

Save

Research Organization:: Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

DOE Contract Number:: AC52-07NA27344

OSTI ID:: 1544466

Report Number(s):: LLNL-SR-764802; 954764

Country of Publication:: United States

Language:: English

Similar Records

Hvac: Removing I/O Bottleneck for Large-Scale Deep Learning Applications

Conference · Thu Sep 01 00:00:00 EDT 2022 · OSTI ID:1544466

Khan, Awais; Paul, Arnab K.; Zimmer, Christopher; +4 more

SCR-Exa: Enhanced Scalable Checkpoint Restart (SCR) Library for Next Generation Exascale Computing

Technical Report · Mon Feb 21 00:00:00 EST 2022 · OSTI ID:1544466

Dai, Donglai

Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules

Journal Article · Mon Oct 17 00:00:00 EDT 2022 · Journal of Cheminformatics · OSTI ID:1544466

Choi, Jong Youl; Zhang, Pei; Mehta, Kshitij; +2 more

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: User-level File Systems Specialized for HPC Workloads Year-End Report

Citation Formats

Similar Records

Related Subjects