DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Trends in data locality abstractions for HPC systems

Abstract

The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.

Authors:
ORCiD logo [1];  [2];  [3];  [4]; ORCiD logo [5];  [6];  [7];  [8];  [9];  [10];  [11];  [12];  [13];  [14];  [15];  [16];  [9];  [17];  [18];  [19] more »; ORCiD logo [20] « less
  1. Koc Univ., Istanbul (Turkey)
  2. Argonne National Lab. (ANL), Lemont, IL (United States)
  3. ETH Zurich, Zurich (Switzerland)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  5. KTH Royal Institute of Technology, Solna (Sweden)
  6. Swiss National Supercomputer, Lugano (Switzerland)
  7. Cray Inc., Seattle, WA (United States)
  8. Intel Corp., Santa Clara, CA (United States)
  9. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  10. Argonne National Lab. (ANL), Argonne, IL (United States)
  11. Ludwig-Maximilians-Univ., Munich (Germany)
  12. Univ. of Erlangen-Nuremberg, Erlangen (Germany)
  13. INRIA Bordeaux Sud-Ouest, Talence (France)
  14. Univ. of Michigan, Ann Arbor, MI (United States)
  15. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  16. Imperial College, London (United Kingdom)
  17. King Abdullah Univ. of Science and Technology, Thuwal (Saudia Arabia)
  18. RIKEN, Hyogo (Japan)
  19. Nvidia Corp., Santa Clara, CA (United States)
  20. Chalmers Univ. of Technology, Goteborg (Sweden)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Argonne National Laboratory (ANL), Argonne, IL (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); German Research Foundation (DFG)
OSTI Identifier:
1356837
Alternate Identifier(s):
OSTI ID: 1393262; OSTI ID: 1525244
Report Number(s):
SAND-2017-3844J
Journal ID: ISSN 1045-9219; 652425
Grant/Contract Number:  
AC04-94AL85000; AC02-06CH11357; AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
IEEE Transactions on Parallel and Distributed Systems
Additional Journal Information:
Journal Volume: 28; Journal Issue: 10; Journal ID: ISSN 1045-9219
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; data locality; programming abstractions; high-performance computing; data layout; locality-aware runtimes

Citation Formats

Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., and Pericas, Miquel. Trends in data locality abstractions for HPC systems. United States: N. p., 2017. Web. doi:10.1109/tpds.2017.2703149.
Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., & Pericas, Miquel. Trends in data locality abstractions for HPC systems. United States. https://doi.org/10.1109/tpds.2017.2703149
Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., and Pericas, Miquel. Wed . "Trends in data locality abstractions for HPC systems". United States. https://doi.org/10.1109/tpds.2017.2703149. https://www.osti.gov/servlets/purl/1356837.
@article{osti_1356837,
title = {Trends in data locality abstractions for HPC systems},
author = {Unat, Didem and Dubey, Anshu and Hoefler, Torsten and Shalf, John and Abraham, Mark and Bianco, Mauro and Chamberlain, Bradford L. and Cledat, Romain and Edwards, H. Carter and Finkel, Hal and Fuerlinger, Karl and Hannig, Frank and Jeannot, Emmanuel and Kamil, Amir and Keasler, Jeff and Kelly, Paul H. J. and Leung, Vitus and Ltaief, Hatem and Maruyama, Naoya and Newburn, Chris J. and Pericas, Miquel},
abstractNote = {The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.},
doi = {10.1109/tpds.2017.2703149},
journal = {IEEE Transactions on Parallel and Distributed Systems},
number = 10,
volume = 28,
place = {United States},
year = {Wed May 10 00:00:00 EDT 2017},
month = {Wed May 10 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 49 works
Citation information provided by
Web of Science

Figures / Tables:

Fig. 1 Fig. 1: illustration of concepts that are important for data locality for a dense two dimensional array. Example iteration space, traversal order, decomposition, data placement and data layout are shown.

Save / Share:

Works referenced in this record:

Polly-ACC Transparent compilation to heterogeneous hardware
conference, June 2016

  • Grosser, Tobias; Hoefler, Torsten
  • Proceedings of the 2016 International Conference on Supercomputing
  • DOI: 10.1145/2925426.2926286

Designing a unified programming model for heterogeneous machines
conference, November 2012

  • Garland, Michael; Kudlur, Manjunath; Zheng, Yili
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/sc.2012.48

Communication and topology-aware load balancing in Charm++ with TreeMatch
conference, September 2013

  • Jeannot, Emmanuel; Meneses, Esteban; Mercier, Guillaume
  • 2013 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/cluster.2013.6702666

The Organization of Computations for Uniform Recurrence Equations
journal, July 1967

  • Karp, Richard M.; Miller, Raymond E.; Winograd, Shmuel
  • Journal of the ACM, Vol. 14, Issue 3
  • DOI: 10.1145/321406.321418

A practical automatic polyhedral parallelizer and locality optimizer
journal, May 2008

  • Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
  • ACM SIGPLAN Notices, Vol. 43, Issue 6
  • DOI: 10.1145/1379022.1375595

Netloc: Towards a Comprehensive View of the HPC System Topology
conference, September 2014

  • Goglin, Brice; Hursey, Joshua; Squyres, Jeffrey M.
  • 2014 43rd International Conference on Parallel Processing Workshops
  • DOI: 10.1109/icppw.2014.38

Modesto
conference, June 2015

  • Gysi, Tobias; Grosser, Tobias; Hoefler, Torsten
  • Proceedings of the 29th ACM on International Conference on Supercomputing
  • DOI: 10.1145/2751205.2751223

The Scalasca performance toolset architecture
journal, January 2010

  • Geimer, Markus; Wolf, Felix; Wylie, Brian J. N.
  • Concurrency and Computation: Practice and Experience
  • DOI: 10.1002/cpe.1556

Active Libraries: Rethinking the roles of compilers and libraries
preprint, January 1998


Slim Fly: A Cost Effective Low-Diameter Network Topology
text, January 2019


A survey of high level frameworks in block-structured adaptive mesh refinement packages
journal, December 2014

  • Dubey, Anshu; Almgren, Ann; Bell, John
  • Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
  • DOI: 10.1016/j.jpdc.2014.07.001

Software Engineering for Computational Science and Engineering
journal, March 2012


The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development
journal, March 2008


Developing Scientific Software
journal, July 2008


Scalable molecular dynamics with NAMD
journal, January 2005

  • Phillips, James C.; Braun, Rosemary; Wang, Wei
  • Journal of Computational Chemistry, Vol. 26, Issue 16, p. 1781-1802
  • DOI: 10.1002/jcc.20289

A component-based architecture for parallel multi-physics PDE simulation
journal, January 2006


Legion: Expressing locality and independence with logical regions
conference, November 2012

  • Bauer, Michael; Treichler, Sean; Slaughter, Elliott
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.71

Polyhedral parallel code generation for CUDA
journal, January 2013

  • Verdoolaege, Sven; Carlos Juega, Juan; Cohen, Albert
  • ACM Transactions on Architecture and Code Optimization, Vol. 9, Issue 4
  • DOI: 10.1145/2400682.2400713

Cache-oblivious algorithms
conference, January 1999

  • Frigo, M.; Leiserson, C. E.; Prokop, H.
  • 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)
  • DOI: 10.1109/SFFCS.1999.814600

A batch scheduler with high level components
conference, January 2005

  • Capit, N.; Da Costa, G.; Georgiou, Y.
  • CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005.
  • DOI: 10.1109/CCGRID.2005.1558641

Job scheduling under the Portable Batch System
book, January 1995


The Organization of Computations for Uniform Recurrence Equations
journal, July 1967

  • Karp, Richard M.; Miller, Raymond E.; Winograd, Shmuel
  • Journal of the ACM, Vol. 14, Issue 3
  • DOI: 10.1145/321406.321418

UPC++: A PGAS Extension for C++
conference, May 2014

  • Zheng, Yili; Kamil, Amir; Driscoll, Michael B.
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2014.115

A practical automatic polyhedral parallelizer and locality optimizer
conference, January 2008

  • Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
  • Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI '08
  • DOI: 10.1145/1375581.1375595

On implementing MPI-IO portably and with high performance
conference, January 1999

  • Thakur, Rajeev; Gropp, William; Lusk, Ewing
  • Proceedings of the sixth workshop on I/O in parallel and distributed systems - IOPADS '99
  • DOI: 10.1145/301816.301826

Parallel netCDF: A High-Performance Scientific I/O Interface
conference, January 2003

  • Li, Jianwei; Zingale, Michael; Liao, Wei-keng
  • Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03
  • DOI: 10.1145/1048935.1050189

Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture
journal, November 2013

  • Kogge, Peter; Shalf, John
  • Computing in Science & Engineering, Vol. 15, Issue 6
  • DOI: 10.1109/MCSE.2013.95

Co-array Fortran for parallel programming
journal, August 1998


ZPL: a machine independent programming language for parallel computers
journal, March 2000

  • Chamberlain, B. L.; Lewis, C.
  • IEEE Transactions on Software Engineering, Vol. 26, Issue 3
  • DOI: 10.1109/32.842947

Titanium: a high-performance Java dialect
journal, September 1998


CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
conference, May 2014

  • Dorier, Matthieu; Antoniu, Gabriel; Ross, Rob
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2014.27

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory
journal, May 2013


SLURM: Simple Linux Utility for Resource Management
book, January 2003

  • Yoo, Andy B.; Jette, Morris A.; Grondona, Mark
  • Job Scheduling Strategies for Parallel Processing
  • DOI: 10.1007/10968987_3

Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc)
conference, July 2014


The Scalasca performance toolset architecture
journal, January 2010

  • Geimer, Markus; Wolf, Felix; Wylie, Brian J. N.
  • Concurrency and Computation: Practice and Experience
  • DOI: 10.1002/cpe.1556

DASH: Data Structures and Algorithms with Support for Hierarchical Locality
book, January 2014


BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework
journal, January 2016

  • Zhang, Weiqun; Almgren, Ann; Day, Marcus
  • SIAM Journal on Scientific Computing, Vol. 38, Issue 5
  • DOI: 10.1137/15M102616X

Modesto
conference, June 2015

  • Gysi, Tobias; Grosser, Tobias; Hoefler, Torsten
  • Proceedings of the 29th ACM on International Conference on Supercomputing
  • DOI: 10.1145/2751205.2751223

A new vision for coarray Fortran
conference, January 2009

  • Mellor-Crummey, John; Adhianto, Laksono; Scherer, William N.
  • Proceedings of the Third Conference on Partitioned Global Address Space Programing Models - PGAS '09
  • DOI: 10.1145/1809961.1809969

Parallel Programmability and the Chapel Language
journal, August 2007

  • Chamberlain, B. L.; Callahan, D.; Zima, H. P.
  • The International Journal of High Performance Computing Applications, Vol. 21, Issue 3
  • DOI: 10.1177/1094342007078442

Programming for parallelism and locality with hierarchically tiled arrays
conference, January 2006

  • Bikshandi, Ganesh; Guo, Jia; Hoeflinger, Daniel
  • Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06
  • DOI: 10.1145/1122971.1122981

Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers
conference, November 2016

  • Tessier, Francois; Malakar, Preeti; Vishwanath, Venkatram
  • 2016 First International Workshop on Communication Optimizations in HPC (COMHPC)
  • DOI: 10.1109/COMHPC.2016.013

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
journal, December 2014

  • Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
  • Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
  • DOI: 10.1016/j.jpdc.2014.07.003

Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks
conference, June 2014

  • Prisacari, Bogdan; Rodriguez, German; Heidelberger, Philip
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing
  • DOI: 10.1145/2600212.2600225

Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems
conference, December 2012

  • Pilla, Laercio L.; Navaux, Philippe O. A.; Ribeiro, Christiane P.
  • 2012 IEEE 18th International Conference on Parallel and Distributed Systems
  • DOI: 10.1109/ICPADS.2012.41

Work-stealing with configurable scheduling strategies
conference, February 2013

  • Wimmer, Martin; Cederman, Daniel; Träff, Jesper Larsson
  • Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
  • DOI: 10.1145/2442516.2442562

OpenMP task scheduling strategies for multicore NUMA systems
journal, February 2012

  • Olivier, Stephen L.; Porterfield, Allan K.; Wheeler, Kyle B.
  • The International Journal of High Performance Computing Applications, Vol. 26, Issue 2
  • DOI: 10.1177/1094342011434065

Technology-Driven, Highly-Scalable Dragonfly Topology
journal, June 2008

  • Kim, John; Dally, Wiliam J.; Scott, Steve
  • ACM SIGARCH Computer Architecture News, Vol. 36, Issue 3
  • DOI: 10.1145/1394608.1382129

Design and implementation of a customizable work stealing scheduler
conference, June 2013

  • Nakashima, Jun; Nakatani, Sho; Taura, Kenjiro
  • Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
  • DOI: 10.1145/2491661.2481433

Works referencing / citing this record:

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
journal, March 2018


EagerMap
journal, December 2018

  • Cruz, Eduardo H. M.; Diener, Matthias; Pilla, Laércio L.
  • ACM Transactions on Parallel Computing, Vol. 5, Issue 4
  • DOI: 10.1145/3309711

Memory‐aware kernel mechanism and policies for improving internode load balancing on NUMA systems
journal, July 2019

  • Chiang, Mei‐Ling; Su, Wei‐Lun; Tu, Shu‐Wei
  • Software: Practice and Experience, Vol. 49, Issue 10
  • DOI: 10.1002/spe.2731

Data Movement Is All You Need: A Case Study on Optimizing Transformers
preprint, January 2020


swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
conference, February 2018

  • Wang, Xinliang; Liu, Weifeng; Xue, Wei
  • PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
  • DOI: 10.1145/3178487.3178513

Impact study of data locality on task-based applications through the Heteroprio scheduler
journal, January 2019


A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations
text, January 2019


Impact study of data locality on task-based applications through the Heteroprio scheduler
journal, January 2019


The future of computing beyond Moore’s Law
journal, January 2020

  • Shalf, John
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166
  • DOI: 10.1098/rsta.2019.0061

Figures / Tables found in this record:

    Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.