Trends in data locality abstractions for HPC systems
Abstract
The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.
- Authors:
-
more »
- Koc Univ., Istanbul (Turkey)
- Argonne National Lab. (ANL), Lemont, IL (United States)
- ETH Zurich, Zurich (Switzerland)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- KTH Royal Institute of Technology, Solna (Sweden)
- Swiss National Supercomputer, Lugano (Switzerland)
- Cray Inc., Seattle, WA (United States)
- Intel Corp., Santa Clara, CA (United States)
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Ludwig-Maximilians-Univ., Munich (Germany)
- Univ. of Erlangen-Nuremberg, Erlangen (Germany)
- INRIA Bordeaux Sud-Ouest, Talence (France)
- Univ. of Michigan, Ann Arbor, MI (United States)
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Imperial College, London (United Kingdom)
- King Abdullah Univ. of Science and Technology, Thuwal (Saudia Arabia)
- RIKEN, Hyogo (Japan)
- Nvidia Corp., Santa Clara, CA (United States)
- Chalmers Univ. of Technology, Goteborg (Sweden)
- Publication Date:
- Research Org.:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Argonne National Laboratory (ANL), Argonne, IL (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); German Research Foundation (DFG)
- OSTI Identifier:
- 1356837
- Alternate Identifier(s):
- OSTI ID: 1393262; OSTI ID: 1525244
- Report Number(s):
- SAND-2017-3844J
Journal ID: ISSN 1045-9219; 652425
- Grant/Contract Number:
- AC04-94AL85000; AC02-06CH11357; AC02-05CH11231
- Resource Type:
- Accepted Manuscript
- Journal Name:
- IEEE Transactions on Parallel and Distributed Systems
- Additional Journal Information:
- Journal Volume: 28; Journal Issue: 10; Journal ID: ISSN 1045-9219
- Publisher:
- IEEE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; data locality; programming abstractions; high-performance computing; data layout; locality-aware runtimes
Citation Formats
Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., and Pericas, Miquel. Trends in data locality abstractions for HPC systems. United States: N. p., 2017.
Web. doi:10.1109/tpds.2017.2703149.
Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., & Pericas, Miquel. Trends in data locality abstractions for HPC systems. United States. https://doi.org/10.1109/tpds.2017.2703149
Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., and Pericas, Miquel. Wed .
"Trends in data locality abstractions for HPC systems". United States. https://doi.org/10.1109/tpds.2017.2703149. https://www.osti.gov/servlets/purl/1356837.
@article{osti_1356837,
title = {Trends in data locality abstractions for HPC systems},
author = {Unat, Didem and Dubey, Anshu and Hoefler, Torsten and Shalf, John and Abraham, Mark and Bianco, Mauro and Chamberlain, Bradford L. and Cledat, Romain and Edwards, H. Carter and Finkel, Hal and Fuerlinger, Karl and Hannig, Frank and Jeannot, Emmanuel and Kamil, Amir and Keasler, Jeff and Kelly, Paul H. J. and Leung, Vitus and Ltaief, Hatem and Maruyama, Naoya and Newburn, Chris J. and Pericas, Miquel},
abstractNote = {The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.},
doi = {10.1109/tpds.2017.2703149},
journal = {IEEE Transactions on Parallel and Distributed Systems},
number = 10,
volume = 28,
place = {United States},
year = {Wed May 10 00:00:00 EDT 2017},
month = {Wed May 10 00:00:00 EDT 2017}
}
Web of Science
Figures / Tables:
Works referenced in this record:
Polly-ACC Transparent compilation to heterogeneous hardware
conference, June 2016
- Grosser, Tobias; Hoefler, Torsten
- Proceedings of the 2016 International Conference on Supercomputing
Designing a unified programming model for heterogeneous machines
conference, November 2012
- Garland, Michael; Kudlur, Manjunath; Zheng, Yili
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Communication and topology-aware load balancing in Charm++ with TreeMatch
conference, September 2013
- Jeannot, Emmanuel; Meneses, Esteban; Mercier, Guillaume
- 2013 IEEE International Conference on Cluster Computing (CLUSTER)
The Organization of Computations for Uniform Recurrence Equations
journal, July 1967
- Karp, Richard M.; Miller, Raymond E.; Winograd, Shmuel
- Journal of the ACM, Vol. 14, Issue 3
A practical automatic polyhedral parallelizer and locality optimizer
journal, May 2008
- Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
- ACM SIGPLAN Notices, Vol. 43, Issue 6
Netloc: Towards a Comprehensive View of the HPC System Topology
conference, September 2014
- Goglin, Brice; Hursey, Joshua; Squyres, Jeffrey M.
- 2014 43rd International Conference on Parallel Processing Workshops
Modesto
conference, June 2015
- Gysi, Tobias; Grosser, Tobias; Hoefler, Torsten
- Proceedings of the 29th ACM on International Conference on Supercomputing
The Scalasca performance toolset architecture
journal, January 2010
- Geimer, Markus; Wolf, Felix; Wylie, Brian J. N.
- Concurrency and Computation: Practice and Experience
Active Libraries: Rethinking the roles of compilers and libraries
preprint, January 1998
- Veldhuizen, Todd L.; Gannon, Dennis
- arXiv
Slim Fly: A Cost Effective Low-Diameter Network Topology
text, January 2019
- Besta, Maciej; Hoefler, Torsten
- arXiv
A survey of high level frameworks in block-structured adaptive mesh refinement packages
journal, December 2014
- Dubey, Anshu; Almgren, Ann; Bell, John
- Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
Software Engineering for Computational Science and Engineering
journal, March 2012
- Carver, Jeffrey C.
- Computing in Science & Engineering, Vol. 14, Issue 2
The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development
journal, March 2008
- Hochstein, L.; Basili, V. R.
- Computer, Vol. 41, Issue 3
Developing Scientific Software
journal, July 2008
- Segal, Judith; Morris, Chris
- IEEE Software, Vol. 25, Issue 4
Scalable molecular dynamics with NAMD
journal, January 2005
- Phillips, James C.; Braun, Rosemary; Wang, Wei
- Journal of Computational Chemistry, Vol. 26, Issue 16, p. 1781-1802
A component-based architecture for parallel multi-physics PDE simulation
journal, January 2006
- Parker, Steven G.
- Future Generation Computer Systems, Vol. 22, Issue 1-2
Legion: Expressing locality and independence with logical regions
conference, November 2012
- Bauer, Michael; Treichler, Sean; Slaughter, Elliott
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Polyhedral parallel code generation for CUDA
journal, January 2013
- Verdoolaege, Sven; Carlos Juega, Juan; Cohen, Albert
- ACM Transactions on Architecture and Code Optimization, Vol. 9, Issue 4
Cache-oblivious algorithms
conference, January 1999
- Frigo, M.; Leiserson, C. E.; Prokop, H.
- 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)
A batch scheduler with high level components
conference, January 2005
- Capit, N.; Da Costa, G.; Georgiou, Y.
- CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005.
Job scheduling under the Portable Batch System
book, January 1995
- Henderson, Robert L.
- Job Scheduling Strategies for Parallel Processing
The Organization of Computations for Uniform Recurrence Equations
journal, July 1967
- Karp, Richard M.; Miller, Raymond E.; Winograd, Shmuel
- Journal of the ACM, Vol. 14, Issue 3
UPC++: A PGAS Extension for C++
conference, May 2014
- Zheng, Yili; Kamil, Amir; Driscoll, Michael B.
- 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
A practical automatic polyhedral parallelizer and locality optimizer
conference, January 2008
- Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
- Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI '08
On implementing MPI-IO portably and with high performance
conference, January 1999
- Thakur, Rajeev; Gropp, William; Lusk, Ewing
- Proceedings of the sixth workshop on I/O in parallel and distributed systems - IOPADS '99
Parallel netCDF: A High-Performance Scientific I/O Interface
conference, January 2003
- Li, Jianwei; Zingale, Michael; Liao, Wei-keng
- Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03
Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture
journal, November 2013
- Kogge, Peter; Shalf, John
- Computing in Science & Engineering, Vol. 15, Issue 6
Co-array Fortran for parallel programming
journal, August 1998
- Numrich, Robert W.; Reid, John
- ACM SIGPLAN Fortran Forum, Vol. 17, Issue 2
ZPL: a machine independent programming language for parallel computers
journal, March 2000
- Chamberlain, B. L.; Lewis, C.
- IEEE Transactions on Software Engineering, Vol. 26, Issue 3
Titanium: a high-performance Java dialect
journal, September 1998
- Yelick, Kathy; Semenzato, Luigi; Pike, Geoff
- Concurrency: Practice and Experience, Vol. 10, Issue 11-13
CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
conference, May 2014
- Dorier, Matthieu; Antoniu, Gabriel; Ross, Rob
- 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory
journal, May 2013
- Hoefler, Torsten; Dinan, James; Buntinas, Darius
- Computing, Vol. 95, Issue 12
SLURM: Simple Linux Utility for Resource Management
book, January 2003
- Yoo, Andy B.; Jette, Morris A.; Grondona, Mark
- Job Scheduling Strategies for Parallel Processing
Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc)
conference, July 2014
- Goglin, Brice
- 2014 International Conference on High Performance Computing & Simulation (HPCS)
The Scalasca performance toolset architecture
journal, January 2010
- Geimer, Markus; Wolf, Felix; Wylie, Brian J. N.
- Concurrency and Computation: Practice and Experience
DASH: Data Structures and Algorithms with Support for Hierarchical Locality
book, January 2014
- Fürlinger, Karl; Glass, Colin; Gracia, Jose
- Lecture Notes in Computer Science
BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework
journal, January 2016
- Zhang, Weiqun; Almgren, Ann; Day, Marcus
- SIAM Journal on Scientific Computing, Vol. 38, Issue 5
Modesto
conference, June 2015
- Gysi, Tobias; Grosser, Tobias; Hoefler, Torsten
- Proceedings of the 29th ACM on International Conference on Supercomputing
A new vision for coarray Fortran
conference, January 2009
- Mellor-Crummey, John; Adhianto, Laksono; Scherer, William N.
- Proceedings of the Third Conference on Partitioned Global Address Space Programing Models - PGAS '09
Parallel Programmability and the Chapel Language
journal, August 2007
- Chamberlain, B. L.; Callahan, D.; Zima, H. P.
- The International Journal of High Performance Computing Applications, Vol. 21, Issue 3
Programming for parallelism and locality with hierarchically tiled arrays
conference, January 2006
- Bikshandi, Ganesh; Guo, Jia; Hoeflinger, Daniel
- Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06
Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers
conference, November 2016
- Tessier, Francois; Malakar, Preeti; Vishwanath, Venkatram
- 2016 First International Workshop on Communication Optimizations in HPC (COMHPC)
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
journal, December 2014
- Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
- Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks
conference, June 2014
- Prisacari, Bogdan; Rodriguez, German; Heidelberger, Philip
- Proceedings of the 23rd international symposium on High-performance parallel and distributed computing
Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems
conference, December 2012
- Pilla, Laercio L.; Navaux, Philippe O. A.; Ribeiro, Christiane P.
- 2012 IEEE 18th International Conference on Parallel and Distributed Systems
Work-stealing with configurable scheduling strategies
conference, February 2013
- Wimmer, Martin; Cederman, Daniel; Träff, Jesper Larsson
- Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
OpenMP task scheduling strategies for multicore NUMA systems
journal, February 2012
- Olivier, Stephen L.; Porterfield, Allan K.; Wheeler, Kyle B.
- The International Journal of High Performance Computing Applications, Vol. 26, Issue 2
Technology-Driven, Highly-Scalable Dragonfly Topology
journal, June 2008
- Kim, John; Dally, Wiliam J.; Scott, Steve
- ACM SIGARCH Computer Architecture News, Vol. 36, Issue 3
Design and implementation of a customizable work stealing scheduler
conference, June 2013
- Nakashima, Jun; Nakatani, Sho; Taura, Kenjiro
- Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Works referencing / citing this record:
swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
journal, March 2018
- Wang, Xinliang; Liu, Weifeng; Xue, Wei
- ACM SIGPLAN Notices, Vol. 53, Issue 1
EagerMap
journal, December 2018
- Cruz, Eduardo H. M.; Diener, Matthias; Pilla, Laércio L.
- ACM Transactions on Parallel Computing, Vol. 5, Issue 4
Memory‐aware kernel mechanism and policies for improving internode load balancing on NUMA systems
journal, July 2019
- Chiang, Mei‐Ling; Su, Wei‐Lun; Tu, Shu‐Wei
- Software: Practice and Experience, Vol. 49, Issue 10
Data Movement Is All You Need: A Case Study on Optimizing Transformers
preprint, January 2020
- Ivanov, Andrei; Dryden, Nikoli; Ben-Nun, Tal
- arXiv
swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
conference, February 2018
- Wang, Xinliang; Liu, Weifeng; Xue, Wei
- PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Impact study of data locality on task-based applications through the Heteroprio scheduler
journal, January 2019
- Bramas, Bérenger
- PeerJ Computer Science, Vol. 5
A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations
text, January 2019
- Ziogas, Alexandros Nikolaos; Ben-Nun, Tal; Fernández, Guillermo Indalecio
- arXiv
Impact study of data locality on task-based applications through the Heteroprio scheduler
journal, January 2019
- Bramas, Bérenger
- PeerJ Computer Science, Vol. 5
The future of computing beyond Moore’s Law
journal, January 2020
- Shalf, John
- Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166