Validating the simulation of large-scale parallel applications using statistical characteristics

Zhang, Deli; Wilke, Jeremiah; Hendry, Gilbert; Dechev, Damian

doi:10.1145/2809778

Title: Validating the simulation of large-scale parallel applications using statistical characteristics

Full Record
Other Related Research

Abstract

Simulation is a widely adopted method to analyze and predict the performance of large-scale parallel applications. Validating the hardware model is highly important for complex simulations with a large number of parameters. Common practice involves calculating the percent error between the projected and the real execution time of a benchmark program. However, in a high-dimensional parameter space, this coarse-grained approach often suffers from parameter insensitivity, which may not be known a priori. Moreover, the traditional approach cannot be applied to the validation of software models, such as application skeletons used in online simulations. In this work, we present a methodology and a toolset for validating both hardware and software models by quantitatively comparing fine-grained statistical characteristics obtained from execution traces. Although statistical information has been used in tasks like performance optimization, this is the first attempt to apply it to simulation validation. Lastly, our experimental results show that the proposed evaluation approach offers significant improvement in fidelity when compared to evaluation using total execution time, and the proposed metrics serve as reliable criteria that progress toward automating the simulation tuning process.

Authors:

Zhang, Deli ^[1]; Wilke, Jeremiah ^[2]; Hendry, Gilbert ^[2]; Dechev, Damian ^[3]

Univ. of Central Florida, Orlando, FL (United States)
Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Univ. of Central Florida, Orlando, FL (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States)

Publication Date:: Tue Mar 01 00:00:00 EST 2016

Research Org.:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States)

Sponsoring Org.:: USDOE National Nuclear Security Administration (NNSA)

OSTI Identifier:: 1333867

Report Number(s):: SAND-2015-2905J
Journal ID: ISSN 2376-3639; 583296

Grant/Contract Number:: AC04-94AL85000

Resource Type:: Accepted Manuscript

Journal Name:: ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Additional Journal Information:: Journal Volume: 1; Journal Issue: 1; Journal ID: ISSN 2376-3639

Publisher:: Association for Computing Machinery

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; measurement; experimentation; simulation evaluation; evaluation metrics; software skeleton

Citation Formats


                    Zhang, Deli, Wilke, Jeremiah, Hendry, Gilbert, and Dechev, Damian. Validating the simulation of large-scale parallel applications using statistical characteristics.  United States: N. p., 2016. 
Web.  doi:10.1145/2809778.

Copy to clipboard


                    Zhang, Deli, Wilke, Jeremiah, Hendry, Gilbert, & Dechev, Damian. Validating the simulation of large-scale parallel applications using statistical characteristics.  United States.  https://doi.org/10.1145/2809778

Copy to clipboard


                    Zhang, Deli, Wilke, Jeremiah, Hendry, Gilbert, and Dechev, Damian. Tue .  
"Validating the simulation of large-scale parallel applications using statistical characteristics".  United States.  https://doi.org/10.1145/2809778.  https://www.osti.gov/servlets/purl/1333867.

Copy to clipboard


                    
@article{osti_1333867,

  title        = {Validating the simulation of large-scale parallel applications using statistical characteristics},

  author       = {Zhang, Deli and Wilke, Jeremiah and Hendry, Gilbert and Dechev, Damian},

  abstractNote = {Simulation is a widely adopted method to analyze and predict the performance of large-scale parallel applications. Validating the hardware model is highly important for complex simulations with a large number of parameters. Common practice involves calculating the percent error between the projected and the real execution time of a benchmark program. However, in a high-dimensional parameter space, this coarse-grained approach often suffers from parameter insensitivity, which may not be known a priori. Moreover, the traditional approach cannot be applied to the validation of software models, such as application skeletons used in online simulations. In this work, we present a methodology and a toolset for validating both hardware and software models by quantitatively comparing fine-grained statistical characteristics obtained from execution traces. Although statistical information has been used in tasks like performance optimization, this is the first attempt to apply it to simulation validation. Lastly, our experimental results show that the proposed evaluation approach offers significant improvement in fidelity when compared to evaluation using total execution time, and the proposed metrics serve as reliable criteria that progress toward automating the simulation tuning process.},

  doi          = {10.1145/2809778},

  journal      = {ACM Transactions on Modeling and Performance Evaluation of Computing Systems},

  number       = 1,

  volume       = 1,

  place        = {United States},

  year         = {Tue Mar 01 00:00:00 EST 2016},

  month        = {Tue Mar 01 00:00:00 EST 2016}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1145/2809778

Other availability

Search WorldCat to find libraries that may hold this journal

Save / Share:

Export Metadata

Save to My Library

Similar Records in DOE PAGES and OSTI.GOV collections:

Machine Learning Based Online Performance Prediction for Runtime Parallelization and Task Scheduling

Conference Li, J ; Ma, X ; Singh, K ; ...

With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as well as developing next-generation software requires assistance from hardware, compilers and runtime systems to exploit parallelism transparently within applications. These systems must decompose applications into tasks that can be executed in parallel and then schedule those tasks to minimize load imbalance. However, many systems lack a priori knowledge about the execution time of all tasks to perform effective load balancing with low scheduling overhead. In this paper, we approach this fundamental problem using machine learning techniques first to generatemore »« less
https://doi.org/10.1109/ISPASS.2009.4919641

Full Text Available
Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

Conference Jin, Ye ; Ma, Xiaosong ; Liu, Qing Gary ; ...

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore »« less
https://doi.org/10.1145/2858788.2688541
Static analysis techniques for semiautomatic synthesis of message passing software skeletons

Journal Article Sottile, Matthew ; Dagit, Jason ; Zhang, Deli ; ... - ACM Transactions on Modeling and Computer Simulation

The design of high-performance computing architectures demands performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process of performance analysis and benchmarking an application can be done in several ways with varying degrees of fidelity. One of the most cost-effective ways is to do a coarse-grained study of large-scale parallel applications through the use of program skeletons. The concept of a “program skeleton” that we discuss in this article is an abstracted program that is derived from a larger program where source code that is determined to be irrelevant is removed formore »« less
Cited by 2
https://doi.org/10.1145/2778888

Full Text Available
Simulating Billion-Task Parallel Programs

Conference Perumalla, Kalyan S ; Park, Alfred J

In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processesmore »« less
SERA - An advanced treatment planning system for neutron therapy and BNCT

Conference Nigg, David W. ; Wemple, Charles A. ; Wessol, Daniel E. ; ...

Although the currently available Boron Neutron Capture Therapy (BNCT) planning systems have served their purpose well, they suffer from somewhat long computation times (2-3 CPU hours or more per field) relative to standard photon therapy planning software. This is largely due to the need for explicit 3D solutions to the relevant transport equations. The simplifying approximations that work well for photon transport computations are not generally applicable to neutron transport computations. Greater computational speeds for BNCT treatment planning must therefore generally be achieved through the application of improved numerical techniques rather than by simplification of the governing equations. Recent effortsmore »« less

Similar Records