skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Abstract

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkitmore » (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.« less

Authors:
 [1];  [2];  [1];  [3]
  1. New Mexico State Univ., Las Cruces, NM (United States). Klipsch School of Electrical and Computer Engineering
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  3. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE Laboratory Directed Research and Development (LDRD) Program; U.S. Army Research Laboratory (ARL); National Science Foundation (NSF)
OSTI Identifier:
1394977
Alternate Identifier(s):
OSTI ID: 1399561
Report Number(s):
LA-UR-17-24198; SAND-2017-8114J
Journal ID: ISSN 0920-8542
Grant/Contract Number:  
AC52-06NA25396; W911NF-07-2-0027; AC04-94AL85000
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Supercomputing
Additional Journal Information:
Journal Volume: 74; Journal Issue: 2; Journal ID: ISSN 0920-8542
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Cache Utilization; Locality; Workload Characterization; Cache Line Utilization; Multicore Cache Simulation; Runtime Evaluation; Scratchpad; Cache utilization; Workload characterization; Cache line utilization; Multicore cache simulation; Runtime evaluation

Citation Formats

Siddique, Nafiul A., Grubel, Patricia A., Badawy, Abdel-Hameed A., and Cook, Jeanine. A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC. United States: N. p., 2017. Web. doi:10.1007/s11227-017-2144-1.
Siddique, Nafiul A., Grubel, Patricia A., Badawy, Abdel-Hameed A., & Cook, Jeanine. A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC. United States. doi:10.1007/s11227-017-2144-1.
Siddique, Nafiul A., Grubel, Patricia A., Badawy, Abdel-Hameed A., and Cook, Jeanine. Wed . "A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC". United States. doi:10.1007/s11227-017-2144-1. https://www.osti.gov/servlets/purl/1394977.
@article{osti_1394977,
title = {A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC},
author = {Siddique, Nafiul A. and Grubel, Patricia A. and Badawy, Abdel-Hameed A. and Cook, Jeanine},
abstractNote = {Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.},
doi = {10.1007/s11227-017-2144-1},
journal = {Journal of Supercomputing},
number = 2,
volume = 74,
place = {United States},
year = {2017},
month = {9}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Basic block distribution analysis to find periodic behavior and simulation points in applications
conference, January 2001

  • Sherwood, T.; Perelman, E.; Calder, B.
  • Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques
  • DOI: 10.1109/PACT.2001.953283

Run-time spatial locality detection and optimization
conference, January 1997

  • Johnson, T. L.; Merten, M. C.; Hwu, W. W.
  • Proceedings of 30th Annual International Symposium on Microarchitecture
  • DOI: 10.1109/MICRO.1997.645797

A New Metric to Measure Cache Utilization for HPC Workloads
conference, January 2016

  • Deshpande, Aditya M.; Draper, Jeffrey T.
  • Proceedings of the Second International Symposium on Memory Systems - MEMSYS '16
  • DOI: 10.1145/2989081.2989125

Hitting the memory wall: implications of the obvious
journal, March 1995

  • Wulf, Wm. A.; McKee, Sally A.
  • ACM SIGARCH Computer Architecture News, Vol. 23, Issue 1
  • DOI: 10.1145/216585.216588

Quantifying Locality In The Memory Access Patterns of HPC Applications
conference, January 2005

  • Weinberg, J.; McCracken, M. O.; Strohmaier, E.
  • ACM/IEEE SC 2005 Conference (SC'05)
  • DOI: 10.1109/SC.2005.59

Performance characterization of the NAS Parallel Benchmarks in OpenCL
conference, November 2011

  • Seo, Sangmin; Jo, Gangwon; Lee, Jaejin
  • 2011 IEEE International Symposium on Workload Characterization (IISWC)
  • DOI: 10.1109/IISWC.2011.6114174

Subsetting the SPEC CPU2006 benchmark suite
journal, March 2007

  • Phansalkar, Aashish; Joshi, Ajay; John, Lizy K.
  • ACM SIGARCH Computer Architecture News, Vol. 35, Issue 1
  • DOI: 10.1145/1241601.1241616

LMStr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors
conference, December 2016

  • Siddique, Nafiul Alam; Badawy, Abdel-Hameed A.; Cook, Jeanine
  • 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC)
  • DOI: 10.1109/PCCC.2016.7820661

Energy, Power, and Performance Characterization of GPGPU Benchmark Programs
conference, May 2016

  • Coplin, Jared; Burtscher, Martin
  • 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • DOI: 10.1109/IPDPSW.2016.164

Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
journal, June 2007

  • Phansalkar, Aashish; Joshi, Ajay; John, Lizy K.
  • ACM SIGARCH Computer Architecture News, Vol. 35, Issue 2
  • DOI: 10.1145/1273440.1250713

The structural simulation toolkit
journal, March 2011

  • Rodrigues, A. F.; CooperBalls, E.; Jacob, B.
  • ACM SIGMETRICS Performance Evaluation Review, Vol. 38, Issue 4
  • DOI: 10.1145/1964218.1964225

Pin: building customized program analysis tools with dynamic instrumentation
conference, January 2005

  • Luk, Chi-Keung; Cohn, Robert; Muth, Robert
  • Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation - PLDI '05
  • DOI: 10.1145/1065010.1065034

Scratchpad memory: design alternative for cache on-chip memory in embedded systems
conference, January 2002

  • Banakar, Rajeshwari; Steinke, Stefan; Lee, Bo-Sik
  • Proceedings of the tenth international symposium on Hardware/software codesign - CODES '02
  • DOI: 10.1145/774789.774805

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
conference, December 2012

  • Kumar, Snehasish; Zhao, Hongzhou; Shriraman, Arrvindh
  • 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
  • DOI: 10.1109/MICRO.2012.42

Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor
journal, March 2010

  • Conway, Pat; Kalyanasundharam, Nathan; Donley, Gregg
  • IEEE Micro, Vol. 30, Issue 2
  • DOI: 10.1109/MM.2010.31

Performance Characterization of SPEC CPU2006 Benchmarks on Intel and AMD Platform
conference, March 2009

  • Li, Shengmei; Cheng, Buqi; Gao, Xingyu
  • 2009 First International Workshop on Education Technology and Computer Science
  • DOI: 10.1109/ETCS.2009.288

SPEClite: using representative samples to reduce SPEC CPU2000 workload
conference, January 2001

  • Todi, R.
  • Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538)
  • DOI: 10.1109/WWC.2001.990740

Predicting whole-program locality through reuse distance analysis
journal, May 2003


Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
conference, January 2004

  • Patil, H.; Cohn, R.; Charney, M.
  • 37th International Symposium on Microarchitecture (MICRO-37'04)
  • DOI: 10.1109/MICRO.2004.28

Exploiting spatial locality in data caches using spatial footprints
conference, January 1998

  • Kumar, S.; Wilkerson, C.
  • ISCA 98: International Symposium on Computer Architecture, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)
  • DOI: 10.1109/ISCA.1998.694794

Towards Performance Predictive Application-Dependent Workload Characterization
conference, November 2012

  • Alkohlani, Waleed; Cook, Jeanine
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
  • DOI: 10.1109/SC.Companion.2012.62

Evaluation techniques for storage hierarchies
journal, January 1970

  • Mattson, R. L.; Gecsei, J.; Slutz, D. R.
  • IBM Systems Journal, Vol. 9, Issue 2
  • DOI: 10.1147/sj.92.0078

Automatically characterizing large scale program behavior
journal, December 2002

  • Sherwood, Timothy; Perelman, Erez; Hamerly, Greg
  • ACM SIGOPS Operating Systems Review, Vol. 36, Issue 5
  • DOI: 10.1145/635508.605403

Cache Utilization as a Locality Metric - A Case Study on the Mantevo Suite
conference, December 2016

  • Siddique, Nafiul Alam; Grubel, Patricia; Badawy, Abdel-Hameed A.
  • 2016 International Conference on Computational Science and Computational Intelligence (CSCI)
  • DOI: 10.1109/CSCI.2016.0110

A Benchmark Characterization of the EEMBC Benchmark Suite
journal, September 2009

  • Poovey, Jason A.; Conte, Thomas M.; Levy, Markus
  • IEEE Micro, Vol. 29, Issue 5
  • DOI: 10.1109/MM.2009.74

Controlling cache utilization of HPC applications
conference, January 2011

  • Perarnau, Swann; Tchiboukdjian, Marc; Huard, Guillaume
  • Proceedings of the international conference on Supercomputing - ICS '11
  • DOI: 10.1145/1995896.1995942

LMStr: exploring shared hardware controlled scratchpad memory for multicores
conference, January 2017

  • Siddique, Nafiul Alam; Badawy, Abdel-Hameed A.; Cook, Jeanine
  • Proceedings of the International Symposium on Memory Systems - MEMSYS '17
  • DOI: 10.1145/3132402.3132440

Data analytics workloads: Characterization and similarity analysis
conference, December 2014

  • Panda, Reena; John, Lizy Kurian
  • 2014 IEEE International Performance Computing and Communications Conference (IPCCC), 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)
  • DOI: 10.1109/PCCC.2014.7017065

GraphBIG: understanding graph computing in the context of industrial solutions
conference, January 2015

  • Nai, Lifeng; Xia, Yinglong; Tanase, Ilie G.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
  • DOI: 10.1145/2807591.2807626

Measuring benchmark similarity using inherent program characteristics
journal, June 2006

  • Ajay Joshi, ; Eeckhout, L.
  • IEEE Transactions on Computers, Vol. 55, Issue 6
  • DOI: 10.1109/TC.2006.85

Benchmark characterization
journal, January 1991

  • Conte, T. M.; Hwu, W. -M. W.
  • Computer, Vol. 24, Issue 1
  • DOI: 10.1109/2.67193

The PARSEC benchmark suite: characterization and architectural implications
conference, January 2008

  • Bienia, Christian; Kumar, Sanjeev; Singh, Jaswinder Pal
  • Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08
  • DOI: 10.1145/1454115.1454128

Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics
conference, October 2006

  • Hoste, Kenneth; Eeckhout, Lieven
  • 2006 IEEE International Symposium on Workload Characterization
  • DOI: 10.1109/IISWC.2006.302732

False sharing and spatial locality in multiprocessor caches
journal, June 1994

  • Torrellas, J.; Lam, H. S.; Hennessy, J. L.
  • IEEE Transactions on Computers, Vol. 43, Issue 6
  • DOI: 10.1109/12.286299

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 Architecture
conference, October 2006

  • Ye, Dong; Ray, Joydeep; Harle, Christophe
  • 2006 IEEE International Symposium on Workload Characterization
  • DOI: 10.1109/IISWC.2006.302736

New tiling techniques to improve cache temporal locality
journal, May 1999