Application Characterization at Scale: Lessons learned from developing a distributed Open Community Runtime system for High Performance Computing

Landwehr, Joshua B.; Suetterlein, Joshua D.; Marquez, Andres; Manzano Franco, Joseph B.; Gao, Guang R.

doi:10.1145/2903150.2903166

Title: Application Characterization at Scale: Lessons learned from developing a distributed Open Community Runtime system for High Performance Computing

Conference · Mon May 16 00:00:00 EDT 2016

DOI:https://doi.org/10.1145/2903150.2903166· OSTI ID:1322521

Landwehr, Joshua B.; Suetterlein, Joshua D.; Marquez, Andres; Manzano Franco, Joseph B.; Gao, Guang R.

Since 2012, the U.S. Department of Energy’s X-Stack program has been developing solutions including runtime systems, programming models, languages, compilers, and tools for the Exascale system software to address crucial performance and power requirements. Fine grain programming models and runtime systems show a great potential to efficiently utilize the underlying hardware. Thus, they are essential to many X-Stack efforts. An abundant amount of small tasks can better utilize the vast parallelism available on current and future machines. Moreover, finer tasks can recover faster and adapt better, due to a decrease in state and control. Nevertheless, current applications have been written to exploit old paradigms (such as Communicating Sequential Processor and Bulk Synchronous Parallel processing). To fully utilize the advantages of these new systems, applications need to be adapted to these new paradigms. As part of the applications’ porting process, in-depth characterization studies, focused on both application characteristics and runtime features, need to take place to fully understand the application performance bottlenecks and how to resolve them. This paper presents a characterization study for a novel high performance runtime system, called the Open Community Runtime, using key HPC kernels as its vehicle. This study has the following contributions: one of the first high performance, fine grain, distributed memory runtime system implementing the OCR standard (version 0.99a); and a characterization study of key HPC kernels in terms of runtime primitives running on both intra and inter node environments. Running on a general purpose cluster, we have found up to 1635x relative speed-up for a parallel tiled Cholesky Kernels on 128 nodes with 16 cores each and a 1864x relative speed-up for a parallel tiled Smith-Waterman kernel on 128 nodes with 30 cores.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1322521

Report Number(s):: PNNL-SA-116663; KJ0402000

Resource Relation:: Conference: Proceedings of the ACM International Conference on Computing Frontiers (CF 2016), May 16-28, 2016, Como, Italy

Country of Publication:: United States

Language:: English

Similar Records

Asynchronous Runtimes in Action: An Introspective Framework for a Next Gen Runtime

Conference · Mon May 23 00:00:00 EDT 2016 · OSTI ID:1322521

Suetterlein, Joshua D.; Landwehr, Joshua B.; Marquez, Andres; +2 more

Performance and energy impact of OpenMP runtime configurations on power constrained systems

Journal Article · Fri Apr 19 00:00:00 EDT 2019 · Sustainable Computing · OSTI ID:1322521

Shahneous Bari, Md Abdullah; Malik, Abid M.; Qawasmeh, Ahmad; +1 more

Steps toward fault-tolerant quantum chemistry.

Technical Report · Sat May 01 00:00:00 EDT 2010 · OSTI ID:1322521

Taube, Andrew Garvin

Related Subjects

Application characterization
extreme scale runtime systems
execution models

Title: Application Characterization at Scale: Lessons learned from developing a distributed Open Community Runtime system for High Performance Computing

Citation Formats

Similar Records

Related Subjects