skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Abstract

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkitmore » (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.« less

Authors:
 [1];  [2];  [1];  [3]
  1. New Mexico State Univ., Las Cruces, NM (United States). Klipsch School of Electrical and Computer Engineering
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  3. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE Laboratory Directed Research and Development (LDRD) Program; U.S. Army Research Laboratory (ARL); National Science Foundation (NSF)
OSTI Identifier:
1394977
Alternate Identifier(s):
OSTI ID: 1399561
Report Number(s):
LA-UR-17-24198; SAND-2017-8114J
Journal ID: ISSN 0920-8542
Grant/Contract Number:  
AC52-06NA25396; W911NF-07-2-0027; AC04-94AL85000
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Supercomputing
Additional Journal Information:
Journal Volume: 74; Journal Issue: 2; Journal ID: ISSN 0920-8542
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Cache Utilization; Locality; Workload Characterization; Cache Line Utilization; Multicore Cache Simulation; Runtime Evaluation; Scratchpad; Cache utilization; Workload characterization; Cache line utilization; Multicore cache simulation; Runtime evaluation

Citation Formats

Siddique, Nafiul A., Grubel, Patricia A., Badawy, Abdel-Hameed A., and Cook, Jeanine. A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC. United States: N. p., 2017. Web. doi:10.1007/s11227-017-2144-1.
Siddique, Nafiul A., Grubel, Patricia A., Badawy, Abdel-Hameed A., & Cook, Jeanine. A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC. United States. https://doi.org/10.1007/s11227-017-2144-1
Siddique, Nafiul A., Grubel, Patricia A., Badawy, Abdel-Hameed A., and Cook, Jeanine. 2017. "A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC". United States. https://doi.org/10.1007/s11227-017-2144-1. https://www.osti.gov/servlets/purl/1394977.
@article{osti_1394977,
title = {A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC},
author = {Siddique, Nafiul A. and Grubel, Patricia A. and Badawy, Abdel-Hameed A. and Cook, Jeanine},
abstractNote = {Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.},
doi = {10.1007/s11227-017-2144-1},
url = {https://www.osti.gov/biblio/1394977}, journal = {Journal of Supercomputing},
issn = {0920-8542},
number = 2,
volume = 74,
place = {United States},
year = {Wed Sep 20 00:00:00 EDT 2017},
month = {Wed Sep 20 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Basic block distribution analysis to find periodic behavior and simulation points in applications
conference, January 2001


Run-time spatial locality detection and optimization
conference, January 1997


A New Metric to Measure Cache Utilization for HPC Workloads
conference, January 2016


Hitting the memory wall: implications of the obvious
journal, March 1995


Quantifying Locality In The Memory Access Patterns of HPC Applications
conference, January 2005


Performance characterization of the NAS Parallel Benchmarks in OpenCL
conference, November 2011


Subsetting the SPEC CPU2006 benchmark suite
journal, March 2007


LMStr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors
conference, December 2016


Energy, Power, and Performance Characterization of GPGPU Benchmark Programs
conference, May 2016


Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
journal, June 2007


The structural simulation toolkit
journal, March 2011


Pin: building customized program analysis tools with dynamic instrumentation
conference, January 2005


Scratchpad memory: design alternative for cache on-chip memory in embedded systems
conference, January 2002


Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
conference, December 2012


Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor
journal, March 2010


Performance Characterization of SPEC CPU2006 Benchmarks on Intel and AMD Platform
conference, March 2009


SPEClite: using representative samples to reduce SPEC CPU2000 workload
conference, January 2001


Predicting whole-program locality through reuse distance analysis
journal, May 2003


Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
conference, January 2004


Exploiting spatial locality in data caches using spatial footprints
conference, January 1998

  • Kumar, S.; Wilkerson, C.
  • ISCA 98: International Symposium on Computer Architecture, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)
  • https://doi.org/10.1109/ISCA.1998.694794

Towards Performance Predictive Application-Dependent Workload Characterization
conference, November 2012

  • Alkohlani, Waleed; Cook, Jeanine
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
  • https://doi.org/10.1109/SC.Companion.2012.62

Evaluation techniques for storage hierarchies
journal, January 1970


Automatically characterizing large scale program behavior
journal, December 2002


Cache Utilization as a Locality Metric - A Case Study on the Mantevo Suite
conference, December 2016

  • Siddique, Nafiul Alam; Grubel, Patricia; Badawy, Abdel-Hameed A.
  • 2016 International Conference on Computational Science and Computational Intelligence (CSCI)
  • https://doi.org/10.1109/CSCI.2016.0110

A Benchmark Characterization of the EEMBC Benchmark Suite
journal, September 2009


Controlling cache utilization of HPC applications
conference, January 2011


LMStr: exploring shared hardware controlled scratchpad memory for multicores
conference, January 2017


Data analytics workloads: Characterization and similarity analysis
conference, December 2014

  • Panda, Reena; John, Lizy Kurian
  • 2014 IEEE International Performance Computing and Communications Conference (IPCCC), 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)
  • https://doi.org/10.1109/PCCC.2014.7017065

GraphBIG: understanding graph computing in the context of industrial solutions
conference, January 2015

  • Nai, Lifeng; Xia, Yinglong; Tanase, Ilie G.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
  • https://doi.org/10.1145/2807591.2807626

Measuring benchmark similarity using inherent program characteristics
journal, June 2006


Benchmark characterization
journal, January 1991


The PARSEC benchmark suite: characterization and architectural implications
conference, January 2008

  • Bienia, Christian; Kumar, Sanjeev; Singh, Jaswinder Pal
  • Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08
  • https://doi.org/10.1145/1454115.1454128

Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics
conference, October 2006


False sharing and spatial locality in multiprocessor caches
journal, June 1994


Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 Architecture
conference, October 2006


New tiling techniques to improve cache temporal locality
journal, May 1999


DAdHTM: Low overhead dynamically adaptive hardware transactional memory for large graphs a scalability study
conference, August 2017

  • Qayum, Mohammad; Badawy, Abdel-Hameed A.; Cook, Jeanine
  • 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)
  • https://doi.org/10.1109/UIC-ATC.2017.8397653

The time-varying nature of cache utilization: A case study on the Mantevo and Apex benchmarks
conference, August 2017

  • Siddique, Nafiul Alam; Grubel, Patricia A.; Badawy, Abdel-Hameed A.
  • 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)
  • https://doi.org/10.1109/UIC-ATC.2017.8397629

Local memory store (LMStr): A hardware controlled shared scratchpad for multicores
conference, August 2017

  • Siddique, Nafiul A.; Badawy, Abdel-Hameed A.; Cook, Jeanine
  • 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)
  • https://doi.org/10.1109/UIC-ATC.2017.8397630

Works referencing / citing this record:

Design trade-offs for emerging HPC processors based on mobile market technology
journal, March 2019