Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services

Conference ·

Microservices are a powerful new way of building, customizing, and deploying distributed services owing to their flexibility and maintainability. Several large-scale distributed platforms have emerged to serve the growing needs of data-centric workloads and services in commercial computing. Concurrently, high-performance computing (HPC) systems and software are rapidly evolving to meet the demands of diversified applications and heterogeneity. The interplay of hardware factors, software configuration parameters, and the flexibility offered with a microservice architecture makes it nontrivial to estimate the optimal service instantiation for a given application workload. Further, this problem is exacerbated when considering that these services operate in a dynamic and heterogeneous HPC environment. An optimally integrated service can be vastly more performant than a haphazardly integrated one. Existing performance tools for HPC either fail to understand the request-response model of communication inherent to microservices or they operate within a narrow scope, limiting the insight that can be gleaned from employing them in isolation. We propose a methodology for integrated performance analysis of HPC microservices frameworks and applications called SYMBIOSYS. We describe its design and implementation within the context of the Mochi framework. This integration is achieved by combining distributed callpath profiling and tracing with a performance data exchange strategy that collects fine-grained, low-level metrics from the RPC communication library and network layers. The result is a portable, low-overhead performance analysis setup that provides a holistic profile of the dependencies among microservices and how they interact with the Mochi RPC software stack. Using HEPnOS, a production-quality Mochi data service, we demonstrate the low-overhead operation of SYMBIOSYS at scale and use it to identify the root causes of poorly performing service configurations.

Research Organization:
Argonne National Laboratory (ANL)
Sponsoring Organization:
USDOE Office of Science - Office of Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1863758
Country of Publication:
United States
Language:
English

References (17)

RADOS: a scalable, reliable storage service for petabyte-scale storage clusters
  • Weil, Sage A.; Leung, Andrew W.; Brandt, Scott A.
  • Proceedings of the 2nd international workshop on Petascale data storage held in conjunction with Supercomputing '07 - PDSW '07 https://doi.org/10.1145/1374596.1374606
conference January 2007
LittleD: a SQL database for sensor nodes and embedded applications conference March 2014
Exploring the Capabilities of the New MPI_T Interface
  • Islam, Tanzima; Mohror, Kathryn; Schulz, Martin
  • EuroMPI/ASIA '14: 21st European MPI Users' Group Meeting, Proceedings of the 21st European MPI Users' Group Meeting https://doi.org/10.1145/2642769.2642781
conference September 2014
Caliper: Performance Introspection for HPC Software Stacks
  • Boehme, David; Gamblin, Todd; Beckingsale, David
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.46
conference November 2016
A Brief Introduction to the OpenFabrics Interfaces - A New Network API for Maximizing High Performance Application Efficiency conference August 2015
An early prototype of an autonomic performance environment for exascale conference January 2013
Microservices: The Journey So Far and Challenges Ahead journal May 2018
Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir book January 2012
Mercury: Enabling remote procedure call for high-performance computing conference September 2013
Mochi: Composing Data Services for High-Performance Computing Environments journal January 2020
GekkoFS - A Temporary Distributed File System for HPC Applications conference September 2018
Adaptive ensemble simulations of biomolecules journal October 2018
CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research journal December 2018
MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU journal September 2018
The Tau Parallel Performance System journal May 2006
PAPI software-defined events for in-depth performance analysis journal May 2019
The Spack package manager: bringing order to HPC software chaos
  • Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807623
conference January 2015

Similar Records

Mochi: Composing Data Services for High-Performance Computing Environments
Journal Article · Thu Jan 16 23:00:00 EST 2020 · Journal of Computer Science and Technology · OSTI ID:1596688

XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
Journal Article · Tue Apr 02 00:00:00 EDT 2024 · Computing in Science and Engineering · OSTI ID:2545755

mochi-quintain
Software · Thu Jan 05 19:00:00 EST 2017 · OSTI ID:code-69560