Roofline model toolkit: A practical tool for architectural and program analysis
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measure sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
- DOE Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1407288
- Country of Publication:
- United States
- Language:
- English
Impact of modern memory subsystems on cache optimizations for stencil computations
|
conference | January 2005 |
A Roofline Model of Energy
|
conference | May 2013 |
Roofline: an insightful visual performance model for multicore architectures
|
journal | April 2009 |
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
|
journal | February 2009 |
Similar Records
GMH: A Message Passing Toolkit for GPU Clusters
A Massively Parallel Adaptive Fast Multipole Method on Heterogeneous Architectures