skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Trends in data locality abstractions for HPC systems

Journal Article · · IEEE Transactions on Parallel and Distributed Systems
ORCiD logo [1];  [2];  [3];  [4]; ORCiD logo [5];  [6];  [7];  [8];  [9];  [10];  [11];  [12];  [13];  [14];  [15];  [16];  [9];  [17];  [18];  [19] more »; ORCiD logo [20] « less
  1. Koc Univ., Istanbul (Turkey)
  2. Argonne National Lab. (ANL), Lemont, IL (United States)
  3. ETH Zurich, Zurich (Switzerland)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  5. KTH Royal Institute of Technology, Solna (Sweden)
  6. Swiss National Supercomputer, Lugano (Switzerland)
  7. Cray Inc., Seattle, WA (United States)
  8. Intel Corp., Santa Clara, CA (United States)
  9. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  10. Argonne National Lab. (ANL), Argonne, IL (United States)
  11. Ludwig-Maximilians-Univ., Munich (Germany)
  12. Univ. of Erlangen-Nuremberg, Erlangen (Germany)
  13. INRIA Bordeaux Sud-Ouest, Talence (France)
  14. Univ. of Michigan, Ann Arbor, MI (United States)
  15. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  16. Imperial College, London (United Kingdom)
  17. King Abdullah Univ. of Science and Technology, Thuwal (Saudia Arabia)
  18. RIKEN, Hyogo (Japan)
  19. Nvidia Corp., Santa Clara, CA (United States)
  20. Chalmers Univ. of Technology, Goteborg (Sweden)

The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Argonne National Laboratory (ANL), Argonne, IL (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); German Research Foundation (DFG)
Grant/Contract Number:
AC04-94AL85000; AC02-06CH11357; AC02-05CH11231
OSTI ID:
1356837
Alternate ID(s):
OSTI ID: 1393262; OSTI ID: 1525244
Report Number(s):
SAND-2017-3844J; 652425
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue 10; ISSN 1045-9219
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 49 works
Citation information provided by
Web of Science

References (46)

Polly-ACC Transparent compilation to heterogeneous hardware conference June 2016
Designing a unified programming model for heterogeneous machines
  • Garland, Michael; Kudlur, Manjunath; Zheng, Yili
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/sc.2012.48
conference November 2012
Communication and topology-aware load balancing in Charm++ with TreeMatch conference September 2013
The Organization of Computations for Uniform Recurrence Equations journal July 1967
A practical automatic polyhedral parallelizer and locality optimizer journal May 2008
Netloc: Towards a Comprehensive View of the HPC System Topology conference September 2014
Modesto conference June 2015
The Scalasca performance toolset architecture journal January 2010
Active Libraries: Rethinking the roles of compilers and libraries preprint January 1998
Slim Fly: A Cost Effective Low-Diameter Network Topology text January 2019
A survey of high level frameworks in block-structured adaptive mesh refinement packages journal December 2014
Software Engineering for Computational Science and Engineering journal March 2012
The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development journal March 2008
Developing Scientific Software journal July 2008
Scalable molecular dynamics with NAMD journal January 2005
A component-based architecture for parallel multi-physics PDE simulation journal January 2006
Legion: Expressing locality and independence with logical regions
  • Bauer, Michael; Treichler, Sean; Slaughter, Elliott
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71
conference November 2012
Polyhedral parallel code generation for CUDA journal January 2013
Cache-oblivious algorithms conference January 1999
A batch scheduler with high level components conference January 2005
Job scheduling under the Portable Batch System book January 1995
UPC++: A PGAS Extension for C++
  • Zheng, Yili; Kamil, Amir; Driscoll, Michael B.
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.115
conference May 2014
A practical automatic polyhedral parallelizer and locality optimizer
  • Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
  • Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI '08 https://doi.org/10.1145/1375581.1375595
conference January 2008
On implementing MPI-IO portably and with high performance conference January 1999
Parallel netCDF: A High-Performance Scientific I/O Interface conference January 2003
Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture journal November 2013
Co-array Fortran for parallel programming journal August 1998
ZPL: a machine independent programming language for parallel computers journal March 2000
Titanium: a high-performance Java dialect journal September 1998
CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
  • Dorier, Matthieu; Antoniu, Gabriel; Ross, Rob
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.27
conference May 2014
MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory journal May 2013
SLURM: Simple Linux Utility for Resource Management book January 2003
Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc) conference July 2014
DASH: Data Structures and Algorithms with Support for Hierarchical Locality book January 2014
BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework journal January 2016
A new vision for coarray Fortran
  • Mellor-Crummey, John; Adhianto, Laksono; Scherer, William N.
  • Proceedings of the Third Conference on Partitioned Global Address Space Programing Models - PGAS '09 https://doi.org/10.1145/1809961.1809969
conference January 2009
Parallel Programmability and the Chapel Language journal August 2007
Programming for parallelism and locality with hierarchically tiled arrays
  • Bikshandi, Ganesh; Guo, Jia; Hoeflinger, Daniel
  • Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06 https://doi.org/10.1145/1122971.1122981
conference January 2006
Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers conference November 2016
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks
  • Prisacari, Bogdan; Rodriguez, German; Heidelberger, Philip
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing https://doi.org/10.1145/2600212.2600225
conference June 2014
Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems conference December 2012
Work-stealing with configurable scheduling strategies
  • Wimmer, Martin; Cederman, Daniel; Träff, Jesper Larsson
  • Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming https://doi.org/10.1145/2442516.2442562
conference February 2013
OpenMP task scheduling strategies for multicore NUMA systems journal February 2012
Technology-Driven, Highly-Scalable Dragonfly Topology journal June 2008
Design and implementation of a customizable work stealing scheduler conference June 2013

Cited By (8)

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures journal March 2018
EagerMap journal December 2018
Memory‐aware kernel mechanism and policies for improving internode load balancing on NUMA systems journal July 2019
Data Movement Is All You Need: A Case Study on Optimizing Transformers preprint January 2020
swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
  • Wang, Xinliang; Liu, Weifeng; Xue, Wei
  • PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3178487.3178513
conference February 2018
Impact study of data locality on task-based applications through the Heteroprio scheduler journal January 2019
A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations text January 2019
The future of computing beyond Moore’s Law journal January 2020

Figures / Tables (2)