skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Trends in data locality abstractions for HPC systems

Abstract

The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. Furthermore, this paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.

Authors:
ORCiD logo [1];  [2];  [3];  [4]; ORCiD logo [5];  [6];  [7];  [8];  [9];  [10];  [11];  [12];  [13];  [14];  [15];  [16];  [9];  [17];  [18];  [19] more »; ORCiD logo [20] « less
  1. Koc Univ., Istanbul (Turkey)
  2. Argonne National Lab. (ANL), Lemont, IL (United States)
  3. ETH Zurich, Zurich (Switzerland)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  5. KTH Royal Institute of Technology, Solna (Sweden)
  6. Swiss National Supercomputer, Lugano (Switzerland)
  7. Cray Inc., Seattle, WA (United States)
  8. Intel Corp., Santa Clara, CA (United States)
  9. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  10. Argonne National Lab. (ANL), Argonne, IL (United States)
  11. Ludwig-Maximilians-Univ., Munich (Germany)
  12. Univ. of Erlangen-Nuremberg, Erlangen (Germany)
  13. INRIA Bordeaux Sud-Ouest, Talence (France)
  14. Univ. of Michigan, Ann Arbor, MI (United States)
  15. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  16. Imperial College, London (United Kingdom)
  17. King Abdullah Univ. of Science and Technology, Thuwal (Saudia Arabia)
  18. RIKEN, Hyogo (Japan)
  19. Nvidia Corp., Santa Clara, CA (United States)
  20. Chalmers Univ. of Technology, Goteborg (Sweden)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1356837
Report Number(s):
SAND-2017-3844J
Journal ID: ISSN 1045-9219; 652425
Grant/Contract Number:
AC04-94AL85000
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
IEEE Transactions on Parallel and Distributed Systems
Additional Journal Information:
Journal Volume: 28; Journal Issue: 10; Journal ID: ISSN 1045-9219
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; data locality; programming abstractions; high-performance computing; data layout; locality-aware runtimes

Citation Formats

Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., and Pericas, Miquel. Trends in data locality abstractions for HPC systems. United States: N. p., 2017. Web. doi:10.1109/tpds.2017.2703149.
Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., & Pericas, Miquel. Trends in data locality abstractions for HPC systems. United States. doi:10.1109/tpds.2017.2703149.
Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., and Pericas, Miquel. Wed . "Trends in data locality abstractions for HPC systems". United States. doi:10.1109/tpds.2017.2703149. https://www.osti.gov/servlets/purl/1356837.
@article{osti_1356837,
title = {Trends in data locality abstractions for HPC systems},
author = {Unat, Didem and Dubey, Anshu and Hoefler, Torsten and Shalf, John and Abraham, Mark and Bianco, Mauro and Chamberlain, Bradford L. and Cledat, Romain and Edwards, H. Carter and Finkel, Hal and Fuerlinger, Karl and Hannig, Frank and Jeannot, Emmanuel and Kamil, Amir and Keasler, Jeff and Kelly, Paul H. J. and Leung, Vitus and Ltaief, Hatem and Maruyama, Naoya and Newburn, Chris J. and Pericas, Miquel},
abstractNote = {The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. Furthermore, this paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.},
doi = {10.1109/tpds.2017.2703149},
journal = {IEEE Transactions on Parallel and Distributed Systems},
number = 10,
volume = 28,
place = {United States},
year = {Wed May 10 00:00:00 EDT 2017},
month = {Wed May 10 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:
  • The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models.
  • Many abstractions of program dependences have already been proposed, such as the Dependence Distance, the Dependence Direction Vector, the Dependence Level or the Dependence Cone. These different abstractions have different precisions. The minimal abstraction associated to a transformation is the abstraction that contains the minimal amount of information necessary to decide when such a transformation is legal. Minimal abstractions for loop reordering and unimodular transformations are presented. As an example, the dependence cone, which approximates dependences by a convex cone of the dependence distance vectors, is the minimal abstraction for unimodular transformations. It also contains enough information for legally applyingmore » all loop reordering transformations and finding the same set of valid mono- and multi-dimensional linear schedules as the dependence distance set.« less
  • High Performance Computing (HPC) systems are composed of servers containing an ever-increasing number of cores. With such high processor core counts, non-uniform memory access (NUMA) architectures are almost universally used to reduce inter-processor and memory communication bottlenecks by distributing processors and memory throughout a server-internal networking topology. Application studies have shown that the tuning of processes placement in a server s NUMA networking topology to the application can have a dramatic impact on performance. The performance implications are magnified when running a parallel job across multiple server nodes, especially with large scale HPC applications. This paper presents the Locality-Aware Mappingmore » Algorithm (LAMA) for distributing the individual processes of a parallel application across processing resources in an HPC system, paying particular attention to the internal server NUMA topologies. The algorithm is able to support both homogeneous and heterogeneous hardware systems, and dynamically adapts to the available hardware and user-specified process layout at run-time. As implemented in Open MPI, the LAMA provides 362,880 mapping permutations and is able to naturally scale out to additional hardware resources as they become available in future architectures.« less