Damaris: Addressing performance variability in data management for post-petascale simulations
Abstract
With exascale computing on the horizon, reducing performance variability in data management tasks (storage, visualization, analysis, etc.) is becoming a key challenge in sustaining high performance. Here, this variability significantly impacts the overall application performance at scale and its predictability over time. In this article, we present Damaris, a system that leverages dedicated cores in multicore nodes to offload data management tasks, including I/O, data compression, scheduling of data movements, in situ analysis, and visualization. We evaluate Damaris with the CM1 atmospheric simulation and the Nek5000 computational fluid dynamic simulation on four platforms, including NICS’s Kraken and NCSA’s Blue Waters. Our results show that (1) Damaris fully hides the I/O variability as well as all I/O-related costs, thus making simulation performance predictable; (2) it increases the sustained write throughput by a factor of up to 15 compared with standard I/O approaches; (3) it allows almost perfect scalability of the simulation up to over 9,000 cores, as opposed to state-of-the-art approaches that fail to scale; and (4) it enables a seamless connection to the VisIt visualization software to perform in situ analysis and visualization in a way that impacts neither the performance of the simulation nor its variability. In addition, wemore »
- Authors:
-
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Inria, Rennes - Bretagne Atlantique Research Centre (France)
- Univ. of Illinois at Urbana-Champaign, Urbana, IL (United States)
- Univ. of Wisconsin, Madison, WI (United States)
- Publication Date:
- Research Org.:
- Argonne National Laboratory (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES); Central Michigan University; National Center for Atmospheric Research
- OSTI Identifier:
- 1346736
- Grant/Contract Number:
- AC02-06CH11357
- Resource Type:
- Accepted Manuscript
- Journal Name:
- ACM Transactions on Parallel Computing
- Additional Journal Information:
- Journal Volume: 3; Journal Issue: 3; Journal ID: ISSN 2329-4949
- Publisher:
- Association for Computing Machinery
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Damaris; Dedicated Cores; Dedicated Nodes; Design; Exascale Computing; Experimentation; I/O; In Situ Visualization; Performance
Citation Formats
Dorier, Matthieu, Antoniu, Gabriel, Cappello, Franck, Snir, Marc, Sisneros, Robert, Yildiz, Orcun, Ibrahim, Shadi, Peterka, Tom, and Orf, Leigh. Damaris: Addressing performance variability in data management for post-petascale simulations. United States: N. p., 2016.
Web. doi:10.1145/2987371.
Dorier, Matthieu, Antoniu, Gabriel, Cappello, Franck, Snir, Marc, Sisneros, Robert, Yildiz, Orcun, Ibrahim, Shadi, Peterka, Tom, & Orf, Leigh. Damaris: Addressing performance variability in data management for post-petascale simulations. United States. https://doi.org/10.1145/2987371
Dorier, Matthieu, Antoniu, Gabriel, Cappello, Franck, Snir, Marc, Sisneros, Robert, Yildiz, Orcun, Ibrahim, Shadi, Peterka, Tom, and Orf, Leigh. Sat .
"Damaris: Addressing performance variability in data management for post-petascale simulations". United States. https://doi.org/10.1145/2987371. https://www.osti.gov/servlets/purl/1346736.
@article{osti_1346736,
title = {Damaris: Addressing performance variability in data management for post-petascale simulations},
author = {Dorier, Matthieu and Antoniu, Gabriel and Cappello, Franck and Snir, Marc and Sisneros, Robert and Yildiz, Orcun and Ibrahim, Shadi and Peterka, Tom and Orf, Leigh},
abstractNote = {With exascale computing on the horizon, reducing performance variability in data management tasks (storage, visualization, analysis, etc.) is becoming a key challenge in sustaining high performance. Here, this variability significantly impacts the overall application performance at scale and its predictability over time. In this article, we present Damaris, a system that leverages dedicated cores in multicore nodes to offload data management tasks, including I/O, data compression, scheduling of data movements, in situ analysis, and visualization. We evaluate Damaris with the CM1 atmospheric simulation and the Nek5000 computational fluid dynamic simulation on four platforms, including NICS’s Kraken and NCSA’s Blue Waters. Our results show that (1) Damaris fully hides the I/O variability as well as all I/O-related costs, thus making simulation performance predictable; (2) it increases the sustained write throughput by a factor of up to 15 compared with standard I/O approaches; (3) it allows almost perfect scalability of the simulation up to over 9,000 cores, as opposed to state-of-the-art approaches that fail to scale; and (4) it enables a seamless connection to the VisIt visualization software to perform in situ analysis and visualization in a way that impacts neither the performance of the simulation nor its variability. In addition, we extended our implementation of Damaris to also support the use of dedicated nodes and conducted a thorough comparison of the two approaches—dedicated cores and dedicated nodes—for I/O tasks with the aforementioned applications.},
doi = {10.1145/2987371},
journal = {ACM Transactions on Parallel Computing},
number = 3,
volume = 3,
place = {United States},
year = {Sat Oct 01 00:00:00 EDT 2016},
month = {Sat Oct 01 00:00:00 EDT 2016}
}
Works referenced in this record:
Understanding the causes of performance variability in HPC workloads
conference, January 2005
- Skinner, D.; Kramer, W.
- IEEE International. 2005 IEEE Workload Characterization Symposium, 2005., IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005.
Parallel I/O performance: From events to ensembles
conference, April 2010
- Uselton, Andrew; Howison, Mark; Wright, Nicholas J.
- 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
A Flexible Framework for Asynchronous in Situ and in Transit Analytics for Scientific Simulations
conference, May 2014
- Dreher, Matthieu; Raffin, Bruno
- 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
High end scientific codes with computational I/O pipelines: improving their end-to-end performance
conference, January 2011
- Zheng, Fang; Cao, Jianting; Dayal, Jai
- Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities - PDAC '11
Scalable I/O forwarding framework for high-performance computing systems
conference, August 2009
- Ali, Nawab; Carns, Philip; Iskra, Kamil
- 2009 IEEE International Conference on Cluster Computing and Workshops
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
conference, January 2008
- Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten
- Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08
On implementing MPI-IO portably and with high performance
conference, January 1999
- Thakur, Rajeev; Gropp, William; Lusk, Ewing
- Proceedings of the sixth workshop on I/O in parallel and distributed systems - IOPADS '99
Electronic poster: co-visualization of full data and in situ data extracts from unstructured grid cfd at 160k cores
conference, January 2011
- Rasquin, Michel; Sahni, Onkar; Fu, Jing
- Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion - SC '11 Companion
Design and Evaluation of Multiple-Level Data Staging for Blue Gene Systems
journal, June 2011
- Isaila, F.; Garcia Blas, J.; Carretero, J.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 22, Issue 6
Enabling high-speed asynchronous data extraction and transfer using DART
journal, January 2010
- Docan, Ciprian; Parashar, Manish; Klasky, Scott
- Concurrency and Computation: Practice and Experience
Scaling parallel I/O performance through I/O delegate and caching system
conference, November 2008
- Nisar, Arifa; Liao, Wei-keng; Choudhary, Alok
- 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
On the role of burst buffers in leadership-class storage systems
conference, April 2012
- Liu, Ning; Cope, Jason; Carns, Philip
- 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
A Steering Environment for Online Parallel Visualization of Legacy Parallel Simulations
conference, October 2006
- Esnard, Aurelien; Richart, Nicolas; Coulaud, Olivier
- Proceedings. Tenth IEEE International Symposium on Distributed Simulation and Real-Time Applications, 2006 Tenth IEEE International Symposium on Distributed Simulation and Real-Time Applications
An Adaptive Framework for Simulation and Online Remote Visualization of Critical Climate Applications in Resource-constrained Environments
conference, November 2010
- Malakar, Preeti; Natarajan, Vijay; Vadhiyar, Sathish S.
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
pClock: an arrival curve based approach for QoS guarantees in shared storage systems
conference, January 2007
- Gulati, Ajay; Merchant, Arif; Varman, Peter J.
- Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '07
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
conference, November 2010
- Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS
conference, January 2001
- Prost, Jean-Pierre; Treumann, Richard; Hedges, Richard
- Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01
QoS support for end users of I/O-intensive applications using shared storage systems
conference, January 2011
- Zhang, Xuechen; Davis, Kei; Jiang, Song
- Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
Examples of in transit visualization
conference, January 2011
- Moreland, Kenneth; Hereld, Mark; Papka, Michael E.
- Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities - PDAC '11
ExaViz: a flexible framework to analyse, steer and interact with molecular dynamics simulations
journal, January 2014
- Dreher, Matthieu; Prevoteau-Jonquet, Jessica; Trellet, Mikael
- Faraday Discuss., Vol. 169
Scalable parallel building blocks for custom data analysis
conference, October 2011
- Peterka, Tom; Ross, Robert; Gyulassy, Attila
- 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV)
CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
conference, May 2014
- Dorier, Matthieu; Antoniu, Gabriel; Ross, Rob
- 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
The ParaView Coprocessing Library: A scalable, general purpose in situ visualization library
conference, October 2011
- Fabian, Nathan; Moreland, Kenneth; Thompson, David
- 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV)
Scalable systems software---From mesh generation to scientific visualization: an end-to-end approach to parallel supercomputing
conference, January 2006
- Tu, Tiankai; Yu, Hongfeng; Ramirez-Guzman, Leonardo
- Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06
A Benchmark Simulation for Moist Nonhydrostatic Numerical Models
journal, December 2002
- Bryan, George H.; Fritsch, J. Michael
- Monthly Weather Review, Vol. 130, Issue 12
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures
conference, November 2010
- Li, Min; Vazhkudai, Sudharshan S.; Butt, Ali R.
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Interactive simulation and visualization
journal, January 1999
- Johnson, C.; Parker, S. G.; Hansen, C.
- Computer, Vol. 32, Issue 12
In-situ processing and visualization for ultrascale simulations
journal, July 2007
- Ma, Kwan-Liu; Wang, Chaoli; Yu, Hongfeng
- Journal of Physics: Conference Series, Vol. 78
Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O
conference, September 2012
- Dorier, Matthieu; Antoniu, Gabriel; Cappello, Franck
- 2012 IEEE International Conference on Cluster Computing (CLUSTER)
Visualizing with VTK: a tutorial
journal, January 2000
- Schroeder, W. J.; Avila, L. S.; Hoffman, W.
- IEEE Computer Graphics and Applications, Vol. 20, Issue 5
Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform
conference, May 2012
- Zhang, Fan; Docan, Ciprian; Parashar, Manish
- 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
DataStager: scalable data staging services for petascale applications
conference, January 2009
- Abbasi, Hasan; Wolf, Matthew; Eisenhauer, Greg
- Proceedings of the 18th ACM international symposium on High performance distributed computing - HPDC '09
In-situ I/O processing: a case for location flexibility
conference, January 2011
- Zheng, Fang; Abbasi, Hasan; Cao, Jianting
- Proceedings of the sixth workshop on Parallel Data Storage - PDSW '11
Managing Variability in the IO Performance of Petascale Storage Systems
conference, November 2010
- Lofstead, Jay; Zheng, Fang; Liu, Qing
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
High-level buffering for hiding periodic output cost in scientific simulations
journal, March 2006
- Ma, X.; Lee, J.; Winslett, M.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 17, Issue 3
Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines
conference, June 2011
- Prabhakar, Ramya; Vazhkudai, Sudharshan S.; Kim, Youngjae
- 2011 31st International Conference on Distributed Computing Systems (ICDCS)
Data sieving and collective I/O in ROMIO
conference, January 1999
- Thakur, R.; Gropp, W.; Lusk, E.
- Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation
PreDatA – preparatory data analytics on peta-scale machines
conference, April 2010
- Zheng, Fang; Abbasi, Hasan; Docan, Ciprian
- 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
I/O threads to reduce checkpoint blocking for an electromagnetics solver on Blue Gene/P and Cray XK6
conference, January 2012
- Fu, Jing; Latham, Robert; Min, Misun
- Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '12
In-situ Feature-Based Objects Tracking for Large-Scale Scientific Simulations
conference, November 2012
- Zhang, Fan; Lasluisa, Solomon; Jin, Tong
- 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Extreme Scaling of Production Visualization Software on Diverse Architectures
journal, May 2010
- Childs, Hank; Pugmire, David; Ahern, Sean
- IEEE Computer Graphics and Applications, Vol. 30, Issue 3
Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO
journal, October 2008
- Patrick, Christina M.; Son, SeungWoo; Kandemir, Mahmut
- ACM SIGOPS Operating Systems Review, Vol. 42, Issue 6
A study of I/O methods for parallel visualization of large-scale data
journal, February 2005
- Yu, Hongfeng; Ma, Kwan-Liu
- Parallel Computing, Vol. 31, Issue 2
In Situ Visualization at Extreme Scale: Challenges and Opportunities
journal, November 2009
- Kwan-Liu Ma,
- IEEE Computer Graphics and Applications, Vol. 29, Issue 6
In Situ Visualization for Large-Scale Combustion Simulations
journal, May 2010
- Hongfeng Yu, ; Grout, Ray W.
- IEEE Computer Graphics and Applications, Vol. 30, Issue 3
Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments
conference, July 2010
- Pu, Xing; Liu, Ling; Mei, Yiduo
- 2010 IEEE International Conference on Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on Cloud Computing
Concurrent Visualization in a Production Supercomputing Environment
journal, September 2006
- Ellsworth, D.; Green, B.; Henze, C.
- IEEE Transactions on Visualization and Computer Graphics, Vol. 12, Issue 5
Scheduling the I/O of HPC Applications Under Congestion
conference, May 2015
- Gainaru, Ana; Aupy, Guillaume; Benoit, Anne
- 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Works referencing / citing this record:
CoSS: proposing a contract-based storage system for HPC
conference, January 2017
- Dorier, Matthieu; Dreher, Matthieu; Peterka, Tom
- Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems - PDSW-DISCS '17