Trends in data locality abstractions for HPC systems

Unat, Didem; Dubey, Anshu; Hoefler, Torsten; Shalf, John; Abraham, Mark; Bianco, Mauro; Chamberlain, Bradford L.; Cledat, Romain; Edwards, H. Carter; Finkel, Hal; Fuerlinger, Karl; Hannig, Frank; Jeannot, Emmanuel; Kamil, Amir; Keasler, Jeff; Kelly, Paul H. J.; Leung, Vitus; Ltaief, Hatem; Maruyama, Naoya; Newburn, Chris J.; Pericas, Miquel

doi:10.1109/tpds.2017.2703149

Title: Trends in data locality abstractions for HPC systems

Abstract

The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.

Authors:

^[1]; Dubey, Anshu ^[2]; Hoefler, Torsten ^[3]; Shalf, John ^[4];

^[5]; Bianco, Mauro ^[6]; Chamberlain, Bradford L. ^[7]; Cledat, Romain ^[8]; Edwards, H. Carter ^[9]; Finkel, Hal ^[10]; Fuerlinger, Karl ^[11]; Hannig, Frank ^[12]; Jeannot, Emmanuel ^[13]; Kamil, Amir ^[14]; Keasler, Jeff ^[15]; Kelly, Paul H. J. ^[16]; Leung, Vitus ^[9]; Ltaief, Hatem ^[17]; Maruyama, Naoya ^[18]; Newburn, Chris J. ^[19] more »

Koc Univ., Istanbul (Turkey)
Argonne National Lab. (ANL), Lemont, IL (United States)
ETH Zurich, Zurich (Switzerland)
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
KTH Royal Institute of Technology, Solna (Sweden)
Swiss National Supercomputer, Lugano (Switzerland)
Cray Inc., Seattle, WA (United States)
Intel Corp., Santa Clara, CA (United States)
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Argonne National Lab. (ANL), Argonne, IL (United States)
Ludwig-Maximilians-Univ., Munich (Germany)
Univ. of Erlangen-Nuremberg, Erlangen (Germany)
INRIA Bordeaux Sud-Ouest, Talence (France)
Univ. of Michigan, Ann Arbor, MI (United States)
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Imperial College, London (United Kingdom)
King Abdullah Univ. of Science and Technology, Thuwal (Saudia Arabia)
RIKEN, Hyogo (Japan)
Nvidia Corp., Santa Clara, CA (United States)
Chalmers Univ. of Technology, Goteborg (Sweden)

Publication Date:: Wed May 10 00:00:00 EDT 2017

Research Org.:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Argonne National Laboratory (ANL), Argonne, IL (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Org.:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); German Research Foundation (DFG)

OSTI Identifier:: 1356837

Alternate Identifier(s):: OSTI ID: 1393262; OSTI ID: 1525244

Report Number(s):: SAND-2017-3844J
Journal ID: ISSN 1045-9219; 652425

Grant/Contract Number:: AC04-94AL85000; AC02-06CH11357; AC02-05CH11231

Resource Type:: Accepted Manuscript

Journal Name:: IEEE Transactions on Parallel and Distributed Systems

Additional Journal Information:: Journal Volume: 28; Journal Issue: 10; Journal ID: ISSN 1045-9219

Publisher:: IEEE

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; data locality; programming abstractions; high-performance computing; data layout; locality-aware runtimes

Citation Formats


                    Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., and Pericas, Miquel. Trends in data locality abstractions for HPC systems.  United States: N. p., 2017. 
Web.  doi:10.1109/tpds.2017.2703149.

Copy to clipboard


                    Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., & Pericas, Miquel. Trends in data locality abstractions for HPC systems.  United States.  https://doi.org/10.1109/tpds.2017.2703149

Copy to clipboard


                    Unat, Didem, Dubey, Anshu, Hoefler, Torsten, Shalf, John, Abraham, Mark, Bianco, Mauro, Chamberlain, Bradford L., Cledat, Romain, Edwards, H. Carter, Finkel, Hal, Fuerlinger, Karl, Hannig, Frank, Jeannot, Emmanuel, Kamil, Amir, Keasler, Jeff, Kelly, Paul H. J., Leung, Vitus, Ltaief, Hatem, Maruyama, Naoya, Newburn, Chris J., and Pericas, Miquel. Wed .  
"Trends in data locality abstractions for HPC systems".  United States.  https://doi.org/10.1109/tpds.2017.2703149.  https://www.osti.gov/servlets/purl/1356837.

Copy to clipboard


                    
@article{osti_1356837,

  title        = {Trends in data locality abstractions for HPC systems},

  author       = {Unat, Didem and Dubey, Anshu and Hoefler, Torsten and Shalf, John and Abraham, Mark and Bianco, Mauro and Chamberlain, Bradford L. and Cledat, Romain and Edwards, H. Carter and Finkel, Hal and Fuerlinger, Karl and Hannig, Frank and Jeannot, Emmanuel and Kamil, Amir and Keasler, Jeff and Kelly, Paul H. J. and Leung, Vitus and Ltaief, Hatem and Maruyama, Naoya and Newburn, Chris J. and Pericas, Miquel},

  abstractNote = {The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.},

  doi          = {10.1109/tpds.2017.2703149},

  journal      = {IEEE Transactions on Parallel and Distributed Systems},

  number       = 10,

  volume       = 28,

  place        = {United States},

  year         = {Wed May 10 00:00:00 EDT 2017},

  month        = {Wed May 10 00:00:00 EDT 2017}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1109/tpds.2017.2703149

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 49 works

Citation information provided by
Web of Science

Figures / Tables:

Fig. 1: illustration of concepts that are important for data locality for a dense two dimensional array. Example iteration space, traversal order, decomposition, data placement and data layout are shown.

All figures and tables (2 total)

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Polly-ACC Transparent compilation to heterogeneous hardware
conference, June 2016

Grosser, Tobias; Hoefler, Torsten
Proceedings of the 2016 International Conference on Supercomputing
DOI: 10.1145/2925426.2926286

Designing a unified programming model for heterogeneous machines
conference, November 2012

Garland, Michael; Kudlur, Manjunath; Zheng, Yili
2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/sc.2012.48

Communication and topology-aware load balancing in Charm++ with TreeMatch
conference, September 2013

Jeannot, Emmanuel; Meneses, Esteban; Mercier, Guillaume
2013 IEEE International Conference on Cluster Computing (CLUSTER)
DOI: 10.1109/cluster.2013.6702666

The Organization of Computations for Uniform Recurrence Equations
journal, July 1967

Karp, Richard M.; Miller, Raymond E.; Winograd, Shmuel
Journal of the ACM, Vol. 14, Issue 3
DOI: 10.1145/321406.321418

A practical automatic polyhedral parallelizer and locality optimizer
journal, May 2008

Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
ACM SIGPLAN Notices, Vol. 43, Issue 6
DOI: 10.1145/1379022.1375595

Netloc: Towards a Comprehensive View of the HPC System Topology
conference, September 2014

Goglin, Brice; Hursey, Joshua; Squyres, Jeffrey M.
2014 43rd International Conference on Parallel Processing Workshops
DOI: 10.1109/icppw.2014.38

Modesto
conference, June 2015

Gysi, Tobias; Grosser, Tobias; Hoefler, Torsten
Proceedings of the 29th ACM on International Conference on Supercomputing
DOI: 10.1145/2751205.2751223

The Scalasca performance toolset architecture
journal, January 2010

Geimer, Markus; Wolf, Felix; Wylie, Brian J. N.
Concurrency and Computation: Practice and Experience
DOI: 10.1002/cpe.1556

Active Libraries: Rethinking the roles of compilers and libraries
preprint, January 1998

Veldhuizen, Todd L.; Gannon, Dennis
arXiv
DOI: 10.48550/arxiv.math/9810022

Slim Fly: A Cost Effective Low-Diameter Network Topology
text, January 2019

Besta, Maciej; Hoefler, Torsten
arXiv
DOI: 10.48550/arxiv.1912.08968

A survey of high level frameworks in block-structured adaptive mesh refinement packages
journal, December 2014

Dubey, Anshu; Almgren, Ann; Bell, John
Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
DOI: 10.1016/j.jpdc.2014.07.001

Software Engineering for Computational Science and Engineering
journal, March 2012

Carver, Jeffrey C.
Computing in Science & Engineering, Vol. 14, Issue 2
DOI: 10.1109/MCSE.2012.31

The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development
journal, March 2008

Hochstein, L.; Basili, V. R.
Computer, Vol. 41, Issue 3
DOI: 10.1109/MC.2008.101

Developing Scientific Software
journal, July 2008

Segal, Judith; Morris, Chris
IEEE Software, Vol. 25, Issue 4
DOI: 10.1109/MS.2008.85

Scalable molecular dynamics with NAMD
journal, January 2005

Phillips, James C.; Braun, Rosemary; Wang, Wei
Journal of Computational Chemistry, Vol. 26, Issue 16, p. 1781-1802
DOI: 10.1002/jcc.20289

A component-based architecture for parallel multi-physics PDE simulation
journal, January 2006

Parker, Steven G.
Future Generation Computer Systems, Vol. 22, Issue 1-2
DOI: 10.1016/j.future.2005.04.001

Legion: Expressing locality and independence with logical regions
conference, November 2012

Bauer, Michael; Treichler, Sean; Slaughter, Elliott
2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SC.2012.71

Polyhedral parallel code generation for CUDA
journal, January 2013

Verdoolaege, Sven; Carlos Juega, Juan; Cohen, Albert
ACM Transactions on Architecture and Code Optimization, Vol. 9, Issue 4
DOI: 10.1145/2400682.2400713

Cache-oblivious algorithms
conference, January 1999

Frigo, M.; Leiserson, C. E.; Prokop, H.
40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)
DOI: 10.1109/SFFCS.1999.814600

A batch scheduler with high level components
conference, January 2005

Capit, N.; Da Costa, G.; Georgiou, Y.
CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005.
DOI: 10.1109/CCGRID.2005.1558641

Job scheduling under the Portable Batch System
book, January 1995

Henderson, Robert L.
Job Scheduling Strategies for Parallel Processing
DOI: 10.1007/3-540-60153-8_34

The Organization of Computations for Uniform Recurrence Equations
journal, July 1967

Karp, Richard M.; Miller, Raymond E.; Winograd, Shmuel
Journal of the ACM, Vol. 14, Issue 3
DOI: 10.1145/321406.321418

UPC++: A PGAS Extension for C++
conference, May 2014

Zheng, Yili; Kamil, Amir; Driscoll, Michael B.
2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
DOI: 10.1109/IPDPS.2014.115

A practical automatic polyhedral parallelizer and locality optimizer
conference, January 2008

Bondhugula, Uday; Hartono, Albert; Ramanujam, J.
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI '08
DOI: 10.1145/1375581.1375595

On implementing MPI-IO portably and with high performance
conference, January 1999

Thakur, Rajeev; Gropp, William; Lusk, Ewing
Proceedings of the sixth workshop on I/O in parallel and distributed systems - IOPADS '99
DOI: 10.1145/301816.301826

Parallel netCDF: A High-Performance Scientific I/O Interface
conference, January 2003

Li, Jianwei; Zingale, Michael; Liao, Wei-keng
Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03
DOI: 10.1145/1048935.1050189

Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture
journal, November 2013

Kogge, Peter; Shalf, John
Computing in Science & Engineering, Vol. 15, Issue 6
DOI: 10.1109/MCSE.2013.95

Co-array Fortran for parallel programming
journal, August 1998

Numrich, Robert W.; Reid, John
ACM SIGPLAN Fortran Forum, Vol. 17, Issue 2
DOI: 10.1145/289918.289920

ZPL: a machine independent programming language for parallel computers
journal, March 2000

Chamberlain, B. L.; Lewis, C.
IEEE Transactions on Software Engineering, Vol. 26, Issue 3
DOI: 10.1109/32.842947

Titanium: a high-performance Java dialect
journal, September 1998

Yelick, Kathy; Semenzato, Luigi; Pike, Geoff
Concurrency: Practice and Experience, Vol. 10, Issue 11-13
DOI: 10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0.CO;2-H

CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
conference, May 2014

Dorier, Matthieu; Antoniu, Gabriel; Ross, Rob
2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
DOI: 10.1109/IPDPS.2014.27

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory
journal, May 2013

Hoefler, Torsten; Dinan, James; Buntinas, Darius
Computing, Vol. 95, Issue 12
DOI: 10.1007/s00607-013-0324-2

SLURM: Simple Linux Utility for Resource Management
book, January 2003

Yoo, Andy B.; Jette, Morris A.; Grondona, Mark
Job Scheduling Strategies for Parallel Processing
DOI: 10.1007/10968987_3

Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc)
conference, July 2014

Goglin, Brice
2014 International Conference on High Performance Computing & Simulation (HPCS)
DOI: 10.1109/HPCSim.2014.6903671

The Scalasca performance toolset architecture
journal, January 2010

Geimer, Markus; Wolf, Felix; Wylie, Brian J. N.
Concurrency and Computation: Practice and Experience
DOI: 10.1002/cpe.1556

DASH: Data Structures and Algorithms with Support for Hierarchical Locality
book, January 2014

Fürlinger, Karl; Glass, Colin; Gracia, Jose
Lecture Notes in Computer Science
DOI: 10.1007/978-3-319-14313-2_46

BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework
journal, January 2016

Zhang, Weiqun; Almgren, Ann; Day, Marcus
SIAM Journal on Scientific Computing, Vol. 38, Issue 5
DOI: 10.1137/15M102616X

Modesto
conference, June 2015

Gysi, Tobias; Grosser, Tobias; Hoefler, Torsten
Proceedings of the 29th ACM on International Conference on Supercomputing
DOI: 10.1145/2751205.2751223

A new vision for coarray Fortran
conference, January 2009

Mellor-Crummey, John; Adhianto, Laksono; Scherer, William N.
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models - PGAS '09
DOI: 10.1145/1809961.1809969

Parallel Programmability and the Chapel Language
journal, August 2007

Chamberlain, B. L.; Callahan, D.; Zima, H. P.
The International Journal of High Performance Computing Applications, Vol. 21, Issue 3
DOI: 10.1177/1094342007078442

Programming for parallelism and locality with hierarchically tiled arrays
conference, January 2006

Bikshandi, Ganesh; Guo, Jia; Hoeflinger, Daniel
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06
DOI: 10.1145/1122971.1122981

Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers
conference, November 2016

Tessier, Francois; Malakar, Preeti; Vishwanath, Venkatram
2016 First International Workshop on Communication Optimizations in HPC (COMHPC)
DOI: 10.1109/COMHPC.2016.013

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
journal, December 2014

Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
Journal of Parallel and Distributed Computing, Vol. 74, Issue 12
DOI: 10.1016/j.jpdc.2014.07.003

Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks
conference, June 2014

Prisacari, Bogdan; Rodriguez, German; Heidelberger, Philip
Proceedings of the 23rd international symposium on High-performance parallel and distributed computing
DOI: 10.1145/2600212.2600225

Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems
conference, December 2012

Pilla, Laercio L.; Navaux, Philippe O. A.; Ribeiro, Christiane P.
2012 IEEE 18th International Conference on Parallel and Distributed Systems
DOI: 10.1109/ICPADS.2012.41

Work-stealing with configurable scheduling strategies
conference, February 2013

Wimmer, Martin; Cederman, Daniel; Träff, Jesper Larsson
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
DOI: 10.1145/2442516.2442562

OpenMP task scheduling strategies for multicore NUMA systems
journal, February 2012

Olivier, Stephen L.; Porterfield, Allan K.; Wheeler, Kyle B.
The International Journal of High Performance Computing Applications, Vol. 26, Issue 2
DOI: 10.1177/1094342011434065

Technology-Driven, Highly-Scalable Dragonfly Topology
journal, June 2008

Kim, John; Dally, Wiliam J.; Scott, Steve
ACM SIGARCH Computer Architecture News, Vol. 36, Issue 3
DOI: 10.1145/1394608.1382129

Design and implementation of a customizable work stealing scheduler
conference, June 2013

Nakashima, Jun; Nakatani, Sho; Taura, Kenjiro
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
DOI: 10.1145/2491661.2481433

Works referencing / citing this record:

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
journal, March 2018

Wang, Xinliang; Liu, Weifeng; Xue, Wei
ACM SIGPLAN Notices, Vol. 53, Issue 1
DOI: 10.1145/3200691.3178513

EagerMap
journal, December 2018

Cruz, Eduardo H. M.; Diener, Matthias; Pilla, Laércio L.
ACM Transactions on Parallel Computing, Vol. 5, Issue 4
DOI: 10.1145/3309711

Memory‐aware kernel mechanism and policies for improving internode load balancing on NUMA systems
journal, July 2019

Chiang, Mei‐Ling; Su, Wei‐Lun; Tu, Shu‐Wei
Software: Practice and Experience, Vol. 49, Issue 10
DOI: 10.1002/spe.2731

Data Movement Is All You Need: A Case Study on Optimizing Transformers
preprint, January 2020

Ivanov, Andrei; Dryden, Nikoli; Ben-Nun, Tal
arXiv
DOI: 10.48550/arxiv.2007.00072

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
conference, February 2018

Wang, Xinliang; Liu, Weifeng; Xue, Wei
PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
DOI: 10.1145/3178487.3178513

Impact study of data locality on task-based applications through the Heteroprio scheduler
journal, January 2019

Bramas, Bérenger
PeerJ Computer Science, Vol. 5
DOI: 10.7717/peerj-cs.190

A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations
text, January 2019

Ziogas, Alexandros Nikolaos; Ben-Nun, Tal; Fernández, Guillermo Indalecio
arXiv
DOI: 10.48550/arxiv.1912.10024

Impact study of data locality on task-based applications through the Heteroprio scheduler
journal, January 2019

Bramas, Bérenger
PeerJ Computer Science, Vol. 5
DOI: 10.7717/peerj-cs.190

The future of computing beyond Moore’s Law
journal, January 2020

Shalf, John
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166
DOI: 10.1098/rsta.2019.0061

Figures / Tables found in this record:

Fig. 1 (p. 3)

TABLE 1 (p. 4)

Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.

Similar Records in DOE PAGES and OSTI.GOV collections:

...And Eat it Too: High Read Performance in Write-Optimized HPC I/O Middleware File Formats

Conference Klasky, Scott A ; Lofstead, J. ; Bent, John ; ...

As HPC applications run on increasingly high process counts on larger and larger machines, both the frequency of checkpoints needed for fault tolerance and the resolution and size of Data Analysis Dumps are expected to increase proportionally. In order to maintain an acceptable ratio of time spent performing useful computation work to time spent performing I/O, write bandwidth to the underlying storage system must increase proportionally to this increase in the checkpoint and computation size. Unfortunately, popular scientific self-describing file formats such as netCDF and HDF5 are designed with a focus on portability and flexibility. Extra care and careful craftingmore »« less
Mastering HPC Runtime Prediction: From Observing Patterns to a Methodological Approach: Preprint

Conference Menear, Kevin ; Nag, Ambarish ; Perr-Sauer, Jordan ; ...

The continual expansion of high-performance computing (HPC) brings with it an increasing need for efficiency. Heavy investment in energy, hardware, and software infrastructure to support peta- and exascale computing requires the optimization of existing systems and, wherever possible, the discernment and adoption of best-practices towards these goals. Such is the case for runtime prediction. When a job is submitted to an HPC system, an estimate of its runtime is provided by the user in the form of "requested wallclock''. Error in this user-provided estimate can lead to jobs being prematurely killed by the scheduler, increased wait time on the queue,more »« less
Full Text Available
Mastering HPC Runtime Prediction: From Observing Patterns to a Methodological Approach

Conference Menear, Kevin ; Nag, Ambarish ; Perr-Sauer, Jordan ; ...

The continual expansion of high-performance computing (HPC) brings with it an increasing need for efficiency. Heavy investment in energy, hardware, and software infrastructure to support peta- and exascale computing requires the optimization of existing systems and, wherever possible, the discernment and adoption of best-practices towards these goals. Such is the case for runtime prediction. When a job is submitted to an HPC system, an estimate of its runtime is provided by the user in the form of "requested wallclock". Error in this user-provided estimate can lead to jobs being prematurely killed by the scheduler, increased wait time on the queue,more »« less
https://doi.org/10.1145/3569951.3593598
Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report Shen, Xipeng

The development of modern processors exhibits two trends that complicate the optimizations of modern software. The first is the increasing sensitivity of processors' throughput to irregularities in computation. With more processors produced through a massive integration of simple cores, future systems will increasingly favor regular data-level parallel computations, but deviate from the needs of applications with complex patterns. Some evidences are already shown on Graphic Processing Units (GPU): Irregular data accesses (e.g., indirect references A[D[i]]) and conditional branches are limiting many GPU applications' performance at a level an order of magnitude lower than the peak of GPU. The second hardwaremore »« less
https://doi.org/10.2172/1576175

Full Text Available
HPC-Colony: Services and Interfaces to Aupport Systems With Very Large Numbers of Processors

Technical Report Jones, T ; Kale, L ; Moreira, J ; ...

The HPC-Colony Project, a collaboration with Lawrence Livermore National Laboratory, the University of Illinois at Urbana-Champaign and IBM, is focused on services and interfaces for very large numbers of processors. Advances in parallel systems in the last decade have delivered phenomenal progress in the overall capability available to a single parallel application. Several systems with peak capability of over 100TF are already available and systems are expected to exceed 1PF within a few years. Despite these impressive advances in peak performance capability, the sustained performance of these systems continues to fall as a percentage of the peak capability. Initial analysismore »« less
https://doi.org/10.2172/902273

Full Text Available

Similar Records

Title: Trends in data locality abstractions for HPC systems

Abstract

Citation Formats

Figures / Tables:

Polly-ACC Transparent compilation to heterogeneous hardware conference, June 2016

Designing a unified programming model for heterogeneous machines conference, November 2012

Communication and topology-aware load balancing in Charm++ with TreeMatch conference, September 2013

The Organization of Computations for Uniform Recurrence Equations journal, July 1967

A practical automatic polyhedral parallelizer and locality optimizer journal, May 2008

Netloc: Towards a Comprehensive View of the HPC System Topology conference, September 2014

Modesto conference, June 2015

The Scalasca performance toolset architecture journal, January 2010

Active Libraries: Rethinking the roles of compilers and libraries preprint, January 1998

Slim Fly: A Cost Effective Low-Diameter Network Topology text, January 2019

A survey of high level frameworks in block-structured adaptive mesh refinement packages journal, December 2014

Software Engineering for Computational Science and Engineering journal, March 2012

The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development journal, March 2008

Developing Scientific Software journal, July 2008

Scalable molecular dynamics with NAMD journal, January 2005

A component-based architecture for parallel multi-physics PDE simulation journal, January 2006

Legion: Expressing locality and independence with logical regions conference, November 2012

Polyhedral parallel code generation for CUDA journal, January 2013

Cache-oblivious algorithms conference, January 1999

A batch scheduler with high level components conference, January 2005

Job scheduling under the Portable Batch System book, January 1995

The Organization of Computations for Uniform Recurrence Equations journal, July 1967

UPC++: A PGAS Extension for C++ conference, May 2014

A practical automatic polyhedral parallelizer and locality optimizer conference, January 2008

On implementing MPI-IO portably and with high performance conference, January 1999

Parallel netCDF: A High-Performance Scientific I/O Interface conference, January 2003

Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture journal, November 2013

Co-array Fortran for parallel programming journal, August 1998

ZPL: a machine independent programming language for parallel computers journal, March 2000

Titanium: a high-performance Java dialect journal, September 1998

CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination conference, May 2014

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory journal, May 2013

SLURM: Simple Linux Utility for Resource Management book, January 2003

Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc) conference, July 2014

The Scalasca performance toolset architecture journal, January 2010

DASH: Data Structures and Algorithms with Support for Hierarchical Locality book, January 2014

BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework journal, January 2016

Modesto conference, June 2015

A new vision for coarray Fortran conference, January 2009

Parallel Programmability and the Chapel Language journal, August 2007

Programming for parallelism and locality with hierarchically tiled arrays conference, January 2006

Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers conference, November 2016

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal, December 2014

Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks conference, June 2014

Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems conference, December 2012

Work-stealing with configurable scheduling strategies conference, February 2013

OpenMP task scheduling strategies for multicore NUMA systems journal, February 2012

Technology-Driven, Highly-Scalable Dragonfly Topology journal, June 2008

Design and implementation of a customizable work stealing scheduler conference, June 2013

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures journal, March 2018

EagerMap journal, December 2018

Memory‐aware kernel mechanism and policies for improving internode load balancing on NUMA systems journal, July 2019

Data Movement Is All You Need: A Case Study on Optimizing Transformers preprint, January 2020

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures conference, February 2018

Impact study of data locality on task-based applications through the Heteroprio scheduler journal, January 2019

A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations text, January 2019

Impact study of data locality on task-based applications through the Heteroprio scheduler journal, January 2019

The future of computing beyond Moore’s Law journal, January 2020

Polly-ACC Transparent compilation to heterogeneous hardware
conference, June 2016

Designing a unified programming model for heterogeneous machines
conference, November 2012

Communication and topology-aware load balancing in Charm++ with TreeMatch
conference, September 2013

The Organization of Computations for Uniform Recurrence Equations
journal, July 1967

A practical automatic polyhedral parallelizer and locality optimizer
journal, May 2008

Netloc: Towards a Comprehensive View of the HPC System Topology
conference, September 2014

Modesto
conference, June 2015

The Scalasca performance toolset architecture
journal, January 2010

Active Libraries: Rethinking the roles of compilers and libraries
preprint, January 1998

Slim Fly: A Cost Effective Low-Diameter Network Topology
text, January 2019

A survey of high level frameworks in block-structured adaptive mesh refinement packages
journal, December 2014

Software Engineering for Computational Science and Engineering
journal, March 2012

The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development
journal, March 2008

Developing Scientific Software
journal, July 2008

Scalable molecular dynamics with NAMD
journal, January 2005

A component-based architecture for parallel multi-physics PDE simulation
journal, January 2006

Legion: Expressing locality and independence with logical regions
conference, November 2012

Polyhedral parallel code generation for CUDA
journal, January 2013

Cache-oblivious algorithms
conference, January 1999

A batch scheduler with high level components
conference, January 2005

Job scheduling under the Portable Batch System
book, January 1995

The Organization of Computations for Uniform Recurrence Equations
journal, July 1967

UPC++: A PGAS Extension for C++
conference, May 2014

A practical automatic polyhedral parallelizer and locality optimizer
conference, January 2008

On implementing MPI-IO portably and with high performance
conference, January 1999

Parallel netCDF: A High-Performance Scientific I/O Interface
conference, January 2003

Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture
journal, November 2013

Co-array Fortran for parallel programming
journal, August 1998

ZPL: a machine independent programming language for parallel computers
journal, March 2000

Titanium: a high-performance Java dialect
journal, September 1998

CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
conference, May 2014

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory
journal, May 2013

SLURM: Simple Linux Utility for Resource Management
book, January 2003

Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc)
conference, July 2014

The Scalasca performance toolset architecture
journal, January 2010

DASH: Data Structures and Algorithms with Support for Hierarchical Locality
book, January 2014

BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework
journal, January 2016

Modesto
conference, June 2015

A new vision for coarray Fortran
conference, January 2009

Parallel Programmability and the Chapel Language
journal, August 2007

Programming for parallelism and locality with hierarchically tiled arrays
conference, January 2006

Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers
conference, November 2016

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
journal, December 2014

Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks
conference, June 2014

Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems
conference, December 2012

Work-stealing with configurable scheduling strategies
conference, February 2013

OpenMP task scheduling strategies for multicore NUMA systems
journal, February 2012

Technology-Driven, Highly-Scalable Dragonfly Topology
journal, June 2008

Design and implementation of a customizable work stealing scheduler
conference, June 2013

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
journal, March 2018

EagerMap
journal, December 2018

Memory‐aware kernel mechanism and policies for improving internode load balancing on NUMA systems
journal, July 2019

Data Movement Is All You Need: A Case Study on Optimizing Transformers
preprint, January 2020

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
conference, February 2018

Impact study of data locality on task-based applications through the Heteroprio scheduler
journal, January 2019

A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations
text, January 2019

Impact study of data locality on task-based applications through the Heteroprio scheduler
journal, January 2019

The future of computing beyond Moore’s Law
journal, January 2020