SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services

Ramesh, Srinivasan; Malony, Allen D.; Carns, Philip; Ross, Robert B.; Dorier, Matthieu; Soumagne, Jerome; Snyder, Shane

doi:10.1109/IPDPS49936.2021.00013

SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services

Conference · Thu Dec 31 23:00:00 EST 2020

DOI:https://doi.org/10.1109/IPDPS49936.2021.00013· OSTI ID:1863758

Ramesh, Srinivasan; Malony, Allen D.; Carns, Philip; Ross, Robert B.; Dorier, Matthieu; Soumagne, Jerome; Snyder, Shane

Microservices are a powerful new way of building, customizing, and deploying distributed services owing to their flexibility and maintainability. Several large-scale distributed platforms have emerged to serve the growing needs of data-centric workloads and services in commercial computing. Concurrently, high-performance computing (HPC) systems and software are rapidly evolving to meet the demands of diversified applications and heterogeneity. The interplay of hardware factors, software configuration parameters, and the flexibility offered with a microservice architecture makes it nontrivial to estimate the optimal service instantiation for a given application workload. Further, this problem is exacerbated when considering that these services operate in a dynamic and heterogeneous HPC environment. An optimally integrated service can be vastly more performant than a haphazardly integrated one. Existing performance tools for HPC either fail to understand the request-response model of communication inherent to microservices or they operate within a narrow scope, limiting the insight that can be gleaned from employing them in isolation. We propose a methodology for integrated performance analysis of HPC microservices frameworks and applications called SYMBIOSYS. We describe its design and implementation within the context of the Mochi framework. This integration is achieved by combining distributed callpath profiling and tracing with a performance data exchange strategy that collects fine-grained, low-level metrics from the RPC communication library and network layers. The result is a portable, low-overhead performance analysis setup that provides a holistic profile of the dependencies among microservices and how they interact with the Mochi RPC software stack. Using HEPnOS, a production-quality Mochi data service, we demonstrate the low-overhead operation of SYMBIOSYS at scale and use it to identify the root causes of poorly performing service configurations.

View Conference

Research Organization:: Argonne National Laboratory (ANL)

Sponsoring Organization:: USDOE Office of Science - Office of Advanced Scientific Computing Research (ASCR)

DOE Contract Number:: AC02-06CH11357

OSTI ID:: 1863758

Country of Publication:: United States

Language:: English

References (17)

RADOS: a scalable, reliable storage service for petabyte-scale storage clusters Weil, Sage A.; Leung, Andrew W.; Brandt, Scott A. Proceedings of the 2nd international workshop on Petascale data storage held in conjunction with Supercomputing '07 - PDSW '07 https://doi.org/10.1145/1374596.1374606	conference	January 2007
LittleD: a SQL database for sensor nodes and embedded applications Douglas, Graeme; Lawrence, Ramon SAC 2014: Symposium on Applied Computing, Proceedings of the 29th Annual ACM Symposium on Applied Computing https://doi.org/10.1145/2554850.2554891	conference	March 2014
Exploring the Capabilities of the New MPI_T Interface Islam, Tanzima; Mohror, Kathryn; Schulz, Martin EuroMPI/ASIA '14: 21st European MPI Users' Group Meeting, Proceedings of the 21st European MPI Users' Group Meeting https://doi.org/10.1145/2642769.2642781	conference	September 2014
Caliper: Performance Introspection for HPC Software Stacks Boehme, David; Gamblin, Todd; Beckingsale, David SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.46	conference	November 2016
A Brief Introduction to the OpenFabrics Interfaces - A New Network API for Maximizing High Performance Application Efficiency Grun, Paul; Hefty, Sean; Sur, Sayantan 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI) https://doi.org/10.1109/HOTI.2015.19	conference	August 2015
An early prototype of an autonomic performance environment for exascale Huck, Kevin; Shende, Sameer; Malony, Allen Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '13 https://doi.org/10.1145/2491661.2481434	conference	January 2013
Microservices: The Journey So Far and Challenges Ahead Jamshidi, Pooyan; Pahl, Claus; Mendonca, Nabor C. IEEE Software, Vol. 35, Issue 3 https://doi.org/10.1109/MS.2018.2141039	journal	May 2018
Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Knüpfer, Andreas; Rössel, Christian; Mey, Dieter an Tools for High Performance Computing 2011 https://doi.org/10.1007/978-3-642-31476-6_7	book	January 2012
Mercury: Enabling remote procedure call for high-performance computing Soumagne, Jerome; Kimpe, Dries; Zounmevo, Judicael 2013 IEEE International Conference on Cluster Computing (CLUSTER) https://doi.org/10.1109/CLUSTER.2013.6702617	conference	September 2013
Mochi: Composing Data Services for High-Performance Computing Environments Ross, Robert B.; Amvrosiadis, George; Carns, Philip Journal of Computer Science and Technology, Vol. 35, Issue 1 https://doi.org/10.1007/s11390-020-9802-0	journal	January 2020
GekkoFS - A Temporary Distributed File System for HPC Applications Vef, Marc-Andre; Moti, Nafiseh; SuB, Tim 2018 IEEE International Conference on Cluster Computing (CLUSTER) https://doi.org/10.1109/CLUSTER.2018.00049	conference	September 2018
Adaptive ensemble simulations of biomolecules Kasson, Peter M.; Jha, Shantenu Current Opinion in Structural Biology, Vol. 52 https://doi.org/10.1016/j.sbi.2018.09.005	journal	October 2018
CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research Wozniak, Justin M.; Jain, Rajeev; Balaprakash, Prasanna BMC Bioinformatics, Vol. 19, Issue S18 https://doi.org/10.1186/s12859-018-2508-4	journal	December 2018
MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU Ramesh, Srinivasan; Mahéo, Aurèle; Shende, Sameer Parallel Computing, Vol. 77 https://doi.org/10.1016/j.parco.2018.05.003	journal	September 2018
The Tau Parallel Performance System Shende, Sameer S.; Malony, Allen D. The International Journal of High Performance Computing Applications, Vol. 20, Issue 2 https://doi.org/10.1177/1094342006064482	journal	May 2006
PAPI software-defined events for in-depth performance analysis Jagode, Heike; Danalis, Anthony; Anzt, Hartwig The International Journal of High Performance Computing Applications, Vol. 33, Issue 6 https://doi.org/10.1177/1094342019846287	journal	May 2019
The Spack package manager: bringing order to HPC software chaos Gamblin, Todd; LeGendre, Matthew; Collette, Michael R. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807623	conference	January 2015

Similar Records

Mochi: Composing Data Services for High-Performance Computing Environments

Journal Article · Thu Jan 16 23:00:00 EST 2020 · Journal of Computer Science and Technology · OSTI ID:1596688

XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing

Journal Article · Tue Apr 02 00:00:00 EDT 2024 · Computing in Science and Engineering · OSTI ID:2545755

mochi-quintain

Software · Thu Jan 05 19:00:00 EST 2017 · OSTI ID:code-69560

Related Subjects

microservices
performance
storage
tools

SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services

Citation Formats

References (17)

Similar Records

Related Subjects