A programmable shared-memory system for an array of processing-in-memory devices
Abstract
Processing in memory (PIM), the concept of integrating processing directly with memory has been attracting a lot of attention, since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and parallel thread execution on multiple PIM devices. In this study, we present AnalyzeThat, a programmable shared-memory system for parallel data processing with PIM devices. Thematic to AnalyzeThat is a rich PIM-aware data structure (PADS), which is an encapsulation that integrally ties together the data, the analysis tasks and the runtime needed to interface with the PIM device array. The PADS abstraction provides (i) a sophisticated key-value data container that allows programmers to easily store data on multiple PIMs, (ii) a suite of parallel operations with which users can easily implement data analysis applications, and (iii) a runtime, hidden to programmers, which provides the mechanisms needed to overlay both the data and the tasks on the PIM device array in an intelligent fashion, based on PIM-specific information collected from the hardware. We have developed a PIM emulation frameworkmore »
- Authors:
-
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sogang Univ., Seoul (Republic of Korea)
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1468266
- Grant/Contract Number:
- AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Cluster Computing
- Additional Journal Information:
- Journal Volume: 22; Journal ID: ISSN 1386-7857
- Publisher:
- Springer
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Programmable devices; Storage systems; Processing-in-memory; Big data processing
Citation Formats
Lee, Sangkuen, Sim, Hyogi, Kim, Youngjae, and Vazhkudai, Sudharshan S. A programmable shared-memory system for an array of processing-in-memory devices. United States: N. p., 2018.
Web. doi:10.1007/s10586-018-2844-1.
Lee, Sangkuen, Sim, Hyogi, Kim, Youngjae, & Vazhkudai, Sudharshan S. A programmable shared-memory system for an array of processing-in-memory devices. United States. doi:https://doi.org/10.1007/s10586-018-2844-1
Lee, Sangkuen, Sim, Hyogi, Kim, Youngjae, and Vazhkudai, Sudharshan S. Thu .
"A programmable shared-memory system for an array of processing-in-memory devices". United States. doi:https://doi.org/10.1007/s10586-018-2844-1. https://www.osti.gov/servlets/purl/1468266.
@article{osti_1468266,
title = {A programmable shared-memory system for an array of processing-in-memory devices},
author = {Lee, Sangkuen and Sim, Hyogi and Kim, Youngjae and Vazhkudai, Sudharshan S.},
abstractNote = {Processing in memory (PIM), the concept of integrating processing directly with memory has been attracting a lot of attention, since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and parallel thread execution on multiple PIM devices. In this study, we present AnalyzeThat, a programmable shared-memory system for parallel data processing with PIM devices. Thematic to AnalyzeThat is a rich PIM-aware data structure (PADS), which is an encapsulation that integrally ties together the data, the analysis tasks and the runtime needed to interface with the PIM device array. The PADS abstraction provides (i) a sophisticated key-value data container that allows programmers to easily store data on multiple PIMs, (ii) a suite of parallel operations with which users can easily implement data analysis applications, and (iii) a runtime, hidden to programmers, which provides the mechanisms needed to overlay both the data and the tasks on the PIM device array in an intelligent fashion, based on PIM-specific information collected from the hardware. We have developed a PIM emulation framework called AnalyzeThat. In conclusion, our experimental evaluation with representative data analytics applications suggests that the proposed system can significantly reduce the PIM programming effort without losing its technology benefits.},
doi = {10.1007/s10586-018-2844-1},
journal = {Cluster Computing},
number = ,
volume = 22,
place = {United States},
year = {2018},
month = {8}
}
Works referenced in this record:
The missing memristor found
journal, May 2008
- Strukov, Dmitri B.; Snider, Gregory S.; Stewart, Duncan R.
- Nature, Vol. 453, Issue 7191
MapReduce: simplified data processing on large clusters
journal, January 2008
- Dean, Jeffrey; Ghemawat, Sanjay; Mehta, Brijesh
- Communications of the ACM, Vol. 51, Issue 1
The International Exascale Software Project roadmap
journal, January 2011
- Dongarra, Jack; Beckman, Pete; Moore, Terry
- The International Journal of High Performance Computing Applications, Vol. 25, Issue 1
Dynamo: amazon's highly available key-value store
journal, October 2007
- DeCandia, Giuseppe; Hastorun, Deniz; Jampani, Madan
- ACM SIGOPS Operating Systems Review, Vol. 41, Issue 6
FlashStore: high throughput persistent key-value store
journal, September 2010
- Debnath, Biplob; Sengupta, Sudipta; Li, Jin
- Proceedings of the VLDB Endowment, Vol. 3, Issue 1-2
SkewTune: mitigating skew in mapreduce applications
conference, January 2012
- Kwon, YongChul; Balazinska, Magdalena; Howe, Bill
- Proceedings of the 2012 international conference on Management of Data - SIGMOD '12
The architecture of the DIVA processing-in-memory chip
conference, January 2002
- Draper, Jeff; Kang, Chang Woo; Kim, Ihn
- Proceedings of the 16th international conference on Supercomputing - ICS '02
FlexRAM: Toward an advanced Intelligent Memory system
conference, September 2012
- Kang, Yi; Huang, Wei; Yoo, Seung-Moon
- 2012 IEEE 30th International Conference on Computer Design (ICCD 2012), 2012 IEEE 30th International Conference on Computer Design (ICCD)
Phoenix++: modular MapReduce for shared-memory systems
conference, January 2011
- Talbot, Justin; Yoo, Richard M.; Kozyrakis, Christos
- Proceedings of the second international workshop on MapReduce and its applications - MapReduce '11
A low cost, multithreaded processing-in-memory system
conference, January 2004
- Brockman, Jay B.; Thoziyoor, Shyamkumar; Kuntz, Shannon K.
- Proceedings of the 3rd workshop on Memory performance issues in conjunction with the 31st international symposium on computer architecture - WMPI '04
NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads
conference, March 2014
- Pugsley, Seth H.; Jestes, Jeffrey; Zhang, Huihui
- 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
A new perspective on processing-in-memory architecture design
conference, January 2013
- Zhang, Dong Ping; Jayasena, Nuwan; Lyashevsky, Alexander
- Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness - MSPC '13
Processing-in-memory technology for knowledge discovery algorithms
conference, January 2006
- Adibi, Jafar; Barrett, Tim; Bhatt, Spundun
- Proceedings of the 2nd international workshop on Data management on new hardware - DaMoN '06
TOP-PIM: throughput-oriented programmable processing in memory
conference, January 2014
- Zhang, Dongping; Jayasena, Nuwan; Lyashevsky, Alexander
- Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system
conference, October 2009
- Yoo, Richard M.; Romano, Anthony; Kozyrakis, Christos
- 2009 IEEE International Symposium on Workload Characterization (IISWC)
Mars: a MapReduce framework on graphics processors
conference, January 2008
- He, Bingsheng; Fang, Wenbin; Luo, Qiong
- Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08
AnalyzeThat: A Programmable Shared-Memory System for an Array of Processing-In-Memory Devices
conference, May 2017
- Lee, Sangkuen; Sim, Hyogi; Kim, Youngjae
- 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
A Comprehensive Performance Comparison of CUDA and OpenCL
conference, September 2011
- Fang, Jianbin; Varbanescu, Ana Lucia; Sips, Henk
- 2011 International Conference on Parallel Processing (ICPP)
Power-Law Distribution of the World Wide Web
journal, March 2000
- Adamic, Lada A.; Huberman, Bernardo A.; Barabási, A. -L.
- Science, Vol. 287, Issue 5461
Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads
journal, July 2014
- Pugsley, Seth H.; Jestes, Jeffrey; Balasubramonian, Rajeev
- IEEE Micro, Vol. 34, Issue 4