skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Understanding and Leveraging the I/O Patterns of Emerging Machine Learning Analytics

Conference ·

The scientific community is currently experiencing unprecedented amounts of data generated by cutting-edge science facilities. Soon facilities will be producing up to 1 PB/s which will force scientist to use more autonomous techniques to learn from the data. The adoption of machine learning methods, like deep learning techniques, in large-scale workflows comes with a shift in the workflow’s computational and I/O patterns. These changes often include iterative processes and model architecture searches, in which datasets are analyzed multiple times in different formats with different model configurations in order to find accurate, reliable and efficient learning models. This shift in behavior brings changes in I/O patterns at the application level as well at the system level. These changes also bring new challenges for the HPC I/O teams, since these patterns contain more complex I/O workloads. In this paper we discuss the I/O patterns experienced by emerging analytical codes that rely on machine learning algorithms and highlight the challenges in designing efficient I/O transfers for such workflows. We comment on how to leverage the data access patterns in order to fetch in a more efficient way the required input data in the format and order given by the needs of the application and how to optimize the data path between collaborative processes. We will motivate our work and show performance gains with a study case of medical applications.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
Resource Relation:
Journal Volume: 1512; Conference: Smoky Mountains Computational Science and Engineering Conference (SMC) - Kingsport, Tennessee, United States of America - 10/18/2021 8:00:00 AM-10/20/2021 8:00:00 AM
Country of Publication:
United States

References (31)

Sensitivity analysis and application of machine learning methods to predict the heat transfer performance of CNT/water nanofluid flows through coils journal January 2019
RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration journal May 2016
EFFIS: An End-to-end Framework for Fusion Integrated Simulation
  • Cummings, Julian; Lofstead, Jay; Schwan, Karsten
  • 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
conference February 2010
The Pegasus workflow management system: Translational computer science in practice journal May 2021
Rotation-invariant convolutional neural networks for galaxy morphology prediction journal April 2015
DataSpaces: an interaction and coordination framework for coupled simulation workflows journal February 2011
Towards a unified architecture for in-RDBMS analytics conference May 2012
ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management journal July 2020
Characterizing Immune Responses in Whole Slide Images of Cancer With Digital Pathology and Pathomics journal December 2020
Consistent cortical reconstruction and multi-atlas brain segmentation journal September 2016
Performance Prediction for Data Transfers in LCLS Workflow conference June 2019
Model Selection Management Systems journal May 2016
ATCS: Auto-Tuning Configurations of Big Data Frameworks Based on Generative Adversarial Nets journal January 2020
Predict Ki-67 Positive Cells in H&E-Stained Images Using Deep Learning Independently From IHC-Stained Images journal August 2020
ModelHub: Deep Learning Lifecycle Management conference April 2017
SparkGA: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale
  • Mushtaq, Hamid; Liu, Frank; Costa, Carlos
  • BCB '17: 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics
conference August 2017
Automatic Hyperparameter Tuning in Deep Convolutional Neural Networks Using Asynchronous Reinforcement Learning conference July 2018
Exascale Deep Learning to Accelerate Cancer Research conference December 2019
On-line Random Forests conference September 2009
Randomness in neural networks: an overview journal February 2017
Enabling Scientific Discovery at Next-Generation Light Sources with Advanced AI and HPC book January 2020
The Partial Information Decomposition of Generative Neural Network Models journal September 2017
Methods for Segmentation and Classification of Digital Microscopy Tissue Images journal April 2019
A+ Tuning: Architecture+Application Auto-Tuning for In-Memory Data-Processing Frameworks conference December 2019
Automatic hyperparameter tuning in on-line learning: Classic Momentum and ADAM conference July 2020
Scaling Deep Learning for Cancer with Advanced Workflow Storage Integration conference November 2018
Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection conference December 2019
Using Randomness to Improve Robustness of Tree-based Models Against Evasion Attacks conference March 2019
A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model journal December 2019
A Fast Medical Image Super Resolution Method Based on Deep Learning Network journal January 2019
A Quick Survey on Large Scale Distributed Deep Learning Systems conference December 2018