Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Generalizable coordination of large multiscale workflows: challenges and learnings at scale

Conference ·

The advancement of machine learning techniques and the heterogeneous architectures of most current supercomputers are propelling the demand for large multiscale simulations that can automatically and autonomously couple diverse components and map them to relevant resources to solve complex problems at multiple scales. Nevertheless, despite the recent progress in workflow technologies, current capabilities are limited to coupling two scales. In the first-ever demonstration of using three scales of resolution, we present a scalable and generalizable framework that couples pairs of models using machine learning and in situ feedback. We expand upon the massively parallel Multiscale Machine-Learned Modeling Infrastructure (MuMMI), a recent, award-winning workflow, and generalize the framework beyond its original design. We discuss the challenges and learnings in executing a massive multiscale simulation campaign that utilized over 600,000 node hours on Summit and achieved more than 98% GPU occupancy for more than 83% of the time. We present innovations to enable several orders of magnitude scaling, including simultaneously coordinating 24,000 jobs, and managing several TBs of new data per day and over a billion files in total. Finally, we describe the generalizability of our framework and, with an upcoming open-source release, discuss how the presented framework may be used for new applications.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1842626
Resource Relation:
Conference: Supercomputing 21: International Conference for High Performance Computing, Networking, Storage and Analysis - St. Louis, Missouri, United States of America - 11/14/2021 10:00:00 AM-11/19/2021 10:00:00 AM
Country of Publication:
United States
Language:
English

References (58)

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers journal September 2015
Flux: Overcoming scheduling challenges for exascale workflows journal September 2020
Pitfalls of the Martini Model journal September 2019
An Ensemble-Based Protocol for the Computational Prediction of Helix–Helix Interactions in G Protein-Coupled Receptors using Coarse-Grained Molecular Dynamics journal April 2017
Parsl: Pervasive Parallel Programming in Python
  • Babuji, Yadu; Foster, Ian; Wilde, Michael
  • Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19 https://doi.org/10.1145/3307681.3325400
conference January 2019
Workflows are the New Applications: Challenges in Performance, Portability, and Productivity conference November 2020
Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone ϕ, ψ and Side-Chain χ 1 and χ 2 Dihedral Angles journal August 2012
The Case of Performance Variability on Dragonfly-based Systems conference May 2020
Machine-learning-based dynamic-importance sampling for adaptive multiscale simulations journal April 2021
Interactive Investigation of Traffic Congestion on Fat‐Tree Networks Using TreeScope journal June 2018
Performance of distributed multiscale simulations
  • Borgdorff, J.; Ben Belgacem, M.; Bona-Casas, C.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 372, Issue 2021 https://doi.org/10.1098/rsta.2013.0407
journal August 2014
preCICE – A fully parallel library for multi-physics surface coupling journal December 2016
The Amber biomolecular simulation programs journal January 2005
A framework for multi-scale modelling
  • Chopard, B.; Borgdorff, Joris; Hoekstra, A. G.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 372, Issue 2021 https://doi.org/10.1098/rsta.2013.0378
journal August 2014
Development and performance of a new version of the OASIS coupler, OASIS3-MCT_3.0 journal January 2017
Pegasus, a workflow management system for science automation journal May 2015
A massively parallel infrastructure for adaptive multiscale simulations: modeling RAS initiation pathway for cancer
  • Di Natale, Francesco; Bhatia, Harsh; Carpenter, Timothy S.
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356197
conference November 2019
The International Exascale Software Project roadmap journal January 2011
Analysis of high performance conjugate heat transfer with the OpenPALM coupler journal July 2015
The Spack package manager: bringing order to HPC software chaos
  • Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807623
conference January 2015
Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability conference January 2007
Coarse-grained simulation reveals key features of HIV-1 capsid self-assembly journal May 2016
Multiscale modelling and simulation: a position paper
  • Hoekstra, Alfons; Chopard, Bastien; Coveney, Peter
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 372, Issue 2021 https://doi.org/10.1098/rsta.2013.0377
journal August 2014
A Fast Scalable Implicit Solver for Nonlinear Time-Evolution Earthquake City Problem on Low-Ordered Unstructured Finite Elements with Artificial Intelligence and Transprecision Computing
  • Ichimura, Tsuyoshi; Fujita, Kohei; Yamaguchi, Takuma
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00052
conference November 2018
Capturing Biologically Complex Tissue-Specific Membranes at Different Levels of Compositional Complexity journal August 2020
Computational Lipidomics of the Neuronal Plasma Membrane journal November 2017
The power of coarse graining in biomolecular simulations: The power of coarse graining in biomolecular simulations
  • Ingólfsson, Helgi I.; Lopez, Cesar A.; Uusitalo, Jaakko J.
  • Wiley Interdisciplinary Reviews: Computational Molecular Science, Vol. 4, Issue 3 https://doi.org/10.1002/wcms.1169
journal August 2013
Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models journal May 2021
FireWorks: a dynamic workflow system designed for high-throughput applications: FireWorks: A Dynamic Workflow System Designed for High-Throughput Applications journal May 2015
Evaluating HPC Networks via Simulation of Parallel Workloads
  • Jain, Nikhil; Bhatele, Abhinav; White, Sam
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.13
conference November 2016
Billion-Scale Similarity Search with GPUs journal July 2021
Drugging an undruggable pocket on KRAS journal July 2019
Cloud-based simulations on Google Exacycle reveal ligand modulation of GPCR activation pathways journal December 2013
Multiscale Modelling and Simulation Workshop:12 Years of Inspiration journal January 2015
The Effect of System Utilization on Application Performance Variability conference June 2019
Dynamic density functional theory of fluids journal April 1999
The MARTINI Force Field:  Coarse Grained Model for Biomolecular Simulations journal July 2007
High-Throughput Simulations Reveal Membrane-Mediated Effects of Alcohols on MscL Gating journal February 2017
KRAS G12C Game of Thrones, which direct KRAS inhibitor will claim the iron throne? journal March 2020
Immature HIV-1 lattice assembly dynamics are regulated by scaffolding from nucleic acid and the plasma membrane journal November 2017
Atomic-level characterization of protein–protein association journal February 2019
Zonal flow generation in inertial confinement fusion implosions journal March 2017
The Frequency of Ras Mutations in Cancer journal March 2020
Aggregation and vesiculation of membrane proteins by curvature-mediated interactions journal May 2007
Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald journal August 2013
Coarse-Graining Methods for Computational Biology journal May 2013
Millisecond-scale molecular dynamics simulations on Anton conference January 2009
Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset journal October 2016
RAS Proteins and Their Regulators in Human Disease journal June 2017
Beyond Finite-Size Scaling in Solidification Simulations journal June 2006
Molecular recognition of RAS/RAF complex at the membrane: Role of RAF cysteine-rich domain journal May 2018
Slow Unfolded-State Structuring in Acyl-CoA Binding Protein Folding Revealed by Simulation and Experiment journal July 2012
A Multiscale Description of Biomolecular Active Matter: The Chemistry Underlying Many Life Processes journal March 2017
Computational Lipidomics with insane : A Versatile Tool for Generating Custom Membranes for Molecular Simulations journal April 2015
Going Backward: A Flexible Geometric Approach to Reverse Transformation from Coarse Grained to Atomistic Models journal January 2014
High-Throughput Simulations of Dimer and Trimer Assembly of Membrane Proteins. The DAFT Approach journal March 2015
SLURM: Simple Linux Utility for Resource Management book January 2003
ddcMD: A fully GPU-accelerated molecular dynamics program for the Martini force field journal July 2020

Similar Records

Related Subjects