A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC
Abstract
Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkitmore »
- Authors:
-
- New Mexico State Univ., Las Cruces, NM (United States). Klipsch School of Electrical and Computer Engineering
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Publication Date:
- Research Org.:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Org.:
- USDOE Laboratory Directed Research and Development (LDRD) Program; U.S. Army Research Laboratory (ARL); National Science Foundation (NSF)
- OSTI Identifier:
- 1394977
- Alternate Identifier(s):
- OSTI ID: 1399561
- Report Number(s):
- LA-UR-17-24198; SAND-2017-8114J
Journal ID: ISSN 0920-8542
- Grant/Contract Number:
- AC52-06NA25396; W911NF-07-2-0027; AC04-94AL85000
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of Supercomputing
- Additional Journal Information:
- Journal Volume: 74; Journal Issue: 2; Journal ID: ISSN 0920-8542
- Publisher:
- Springer
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Cache Utilization; Locality; Workload Characterization; Cache Line Utilization; Multicore Cache Simulation; Runtime Evaluation; Scratchpad; Cache utilization; Workload characterization; Cache line utilization; Multicore cache simulation; Runtime evaluation
Citation Formats
Siddique, Nafiul A., Grubel, Patricia A., Badawy, Abdel-Hameed A., and Cook, Jeanine. A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC. United States: N. p., 2017.
Web. doi:10.1007/s11227-017-2144-1.
Siddique, Nafiul A., Grubel, Patricia A., Badawy, Abdel-Hameed A., & Cook, Jeanine. A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC. United States. https://doi.org/10.1007/s11227-017-2144-1
Siddique, Nafiul A., Grubel, Patricia A., Badawy, Abdel-Hameed A., and Cook, Jeanine. Wed .
"A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC". United States. https://doi.org/10.1007/s11227-017-2144-1. https://www.osti.gov/servlets/purl/1394977.
@article{osti_1394977,
title = {A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC},
author = {Siddique, Nafiul A. and Grubel, Patricia A. and Badawy, Abdel-Hameed A. and Cook, Jeanine},
abstractNote = {Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.},
doi = {10.1007/s11227-017-2144-1},
journal = {Journal of Supercomputing},
number = 2,
volume = 74,
place = {United States},
year = {Wed Sep 20 00:00:00 EDT 2017},
month = {Wed Sep 20 00:00:00 EDT 2017}
}
Web of Science
Works referenced in this record:
Basic block distribution analysis to find periodic behavior and simulation points in applications
conference, January 2001
- Sherwood, T.; Perelman, E.; Calder, B.
- Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques
Run-time spatial locality detection and optimization
conference, January 1997
- Johnson, T. L.; Merten, M. C.; Hwu, W. W.
- Proceedings of 30th Annual International Symposium on Microarchitecture
A New Metric to Measure Cache Utilization for HPC Workloads
conference, January 2016
- Deshpande, Aditya M.; Draper, Jeffrey T.
- Proceedings of the Second International Symposium on Memory Systems - MEMSYS '16
Hitting the memory wall: implications of the obvious
journal, March 1995
- Wulf, Wm. A.; McKee, Sally A.
- ACM SIGARCH Computer Architecture News, Vol. 23, Issue 1
Quantifying Locality In The Memory Access Patterns of HPC Applications
conference, January 2005
- Weinberg, J.; McCracken, M. O.; Strohmaier, E.
- ACM/IEEE SC 2005 Conference (SC'05)
Performance characterization of the NAS Parallel Benchmarks in OpenCL
conference, November 2011
- Seo, Sangmin; Jo, Gangwon; Lee, Jaejin
- 2011 IEEE International Symposium on Workload Characterization (IISWC)
Subsetting the SPEC CPU2006 benchmark suite
journal, March 2007
- Phansalkar, Aashish; Joshi, Ajay; John, Lizy K.
- ACM SIGARCH Computer Architecture News, Vol. 35, Issue 1
LMStr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors
conference, December 2016
- Siddique, Nafiul Alam; Badawy, Abdel-Hameed A.; Cook, Jeanine
- 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC)
Energy, Power, and Performance Characterization of GPGPU Benchmark Programs
conference, May 2016
- Coplin, Jared; Burtscher, Martin
- 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
journal, June 2007
- Phansalkar, Aashish; Joshi, Ajay; John, Lizy K.
- ACM SIGARCH Computer Architecture News, Vol. 35, Issue 2
The structural simulation toolkit
journal, March 2011
- Rodrigues, A. F.; CooperBalls, E.; Jacob, B.
- ACM SIGMETRICS Performance Evaluation Review, Vol. 38, Issue 4
Pin: building customized program analysis tools with dynamic instrumentation
conference, January 2005
- Luk, Chi-Keung; Cohn, Robert; Muth, Robert
- Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation - PLDI '05
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
conference, January 2002
- Banakar, Rajeshwari; Steinke, Stefan; Lee, Bo-Sik
- Proceedings of the tenth international symposium on Hardware/software codesign - CODES '02
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
conference, December 2012
- Kumar, Snehasish; Zhao, Hongzhou; Shriraman, Arrvindh
- 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor
journal, March 2010
- Conway, Pat; Kalyanasundharam, Nathan; Donley, Gregg
- IEEE Micro, Vol. 30, Issue 2
Performance Characterization of SPEC CPU2006 Benchmarks on Intel and AMD Platform
conference, March 2009
- Li, Shengmei; Cheng, Buqi; Gao, Xingyu
- 2009 First International Workshop on Education Technology and Computer Science
SPEClite: using representative samples to reduce SPEC CPU2000 workload
conference, January 2001
- Todi, R.
- Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538)
Predicting whole-program locality through reuse distance analysis
journal, May 2003
- Ding, Chen; Zhong, Yutao
- ACM SIGPLAN Notices, Vol. 38, Issue 5
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
conference, January 2004
- Patil, H.; Cohn, R.; Charney, M.
- 37th International Symposium on Microarchitecture (MICRO-37'04)
Exploiting spatial locality in data caches using spatial footprints
conference, January 1998
- Kumar, S.; Wilkerson, C.
- ISCA 98: International Symposium on Computer Architecture, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)
Towards Performance Predictive Application-Dependent Workload Characterization
conference, November 2012
- Alkohlani, Waleed; Cook, Jeanine
- 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Evaluation techniques for storage hierarchies
journal, January 1970
- Mattson, R. L.; Gecsei, J.; Slutz, D. R.
- IBM Systems Journal, Vol. 9, Issue 2
Automatically characterizing large scale program behavior
journal, December 2002
- Sherwood, Timothy; Perelman, Erez; Hamerly, Greg
- ACM SIGOPS Operating Systems Review, Vol. 36, Issue 5
Cache Utilization as a Locality Metric - A Case Study on the Mantevo Suite
conference, December 2016
- Siddique, Nafiul Alam; Grubel, Patricia; Badawy, Abdel-Hameed A.
- 2016 International Conference on Computational Science and Computational Intelligence (CSCI)
A Benchmark Characterization of the EEMBC Benchmark Suite
journal, September 2009
- Poovey, Jason A.; Conte, Thomas M.; Levy, Markus
- IEEE Micro, Vol. 29, Issue 5
Controlling cache utilization of HPC applications
conference, January 2011
- Perarnau, Swann; Tchiboukdjian, Marc; Huard, Guillaume
- Proceedings of the international conference on Supercomputing - ICS '11
LMStr: exploring shared hardware controlled scratchpad memory for multicores
conference, January 2017
- Siddique, Nafiul Alam; Badawy, Abdel-Hameed A.; Cook, Jeanine
- Proceedings of the International Symposium on Memory Systems - MEMSYS '17
Data analytics workloads: Characterization and similarity analysis
conference, December 2014
- Panda, Reena; John, Lizy Kurian
- 2014 IEEE International Performance Computing and Communications Conference (IPCCC), 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)
GraphBIG: understanding graph computing in the context of industrial solutions
conference, January 2015
- Nai, Lifeng; Xia, Yinglong; Tanase, Ilie G.
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
Measuring benchmark similarity using inherent program characteristics
journal, June 2006
- Ajay Joshi, ; Eeckhout, L.
- IEEE Transactions on Computers, Vol. 55, Issue 6
Benchmark characterization
journal, January 1991
- Conte, T. M.; Hwu, W. -M. W.
- Computer, Vol. 24, Issue 1
The PARSEC benchmark suite: characterization and architectural implications
conference, January 2008
- Bienia, Christian; Kumar, Sanjeev; Singh, Jaswinder Pal
- Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08
Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics
conference, October 2006
- Hoste, Kenneth; Eeckhout, Lieven
- 2006 IEEE International Symposium on Workload Characterization
False sharing and spatial locality in multiprocessor caches
journal, June 1994
- Torrellas, J.; Lam, H. S.; Hennessy, J. L.
- IEEE Transactions on Computers, Vol. 43, Issue 6
Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 Architecture
conference, October 2006
- Ye, Dong; Ray, Joydeep; Harle, Christophe
- 2006 IEEE International Symposium on Workload Characterization
New tiling techniques to improve cache temporal locality
journal, May 1999
- Song, Yonghong; Li, Zhiyuan
- ACM SIGPLAN Notices, Vol. 34, Issue 5
DAdHTM: Low overhead dynamically adaptive hardware transactional memory for large graphs a scalability study
conference, August 2017
- Qayum, Mohammad; Badawy, Abdel-Hameed A.; Cook, Jeanine
- 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)
The time-varying nature of cache utilization: A case study on the Mantevo and Apex benchmarks
conference, August 2017
- Siddique, Nafiul Alam; Grubel, Patricia A.; Badawy, Abdel-Hameed A.
- 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)
Local memory store (LMStr): A hardware controlled shared scratchpad for multicores
conference, August 2017
- Siddique, Nafiul A.; Badawy, Abdel-Hameed A.; Cook, Jeanine
- 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)
Works referencing / citing this record:
Design trade-offs for emerging HPC processors based on mobile market technology
journal, March 2019
- Armejach, Adrià; Casas, Marc; Moretó, Miquel
- The Journal of Supercomputing, Vol. 75, Issue 9