skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performance Portability Strategies for Grid C++ Expression Templates

Authors:
;
Publication Date:
Research Org.:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21)
OSTI Identifier:
1424990
Report Number(s):
BNL-114831-2017-FORE
DOE Contract Number:
SC0012704
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Boyle, Peter, and Lin, Meifing. Performance Portability Strategies for Grid C++ Expression Templates. United States: N. p., 2017. Web. doi:10.2172/1424990.
Boyle, Peter, & Lin, Meifing. Performance Portability Strategies for Grid C++ Expression Templates. United States. doi:10.2172/1424990.
Boyle, Peter, and Lin, Meifing. Tue . "Performance Portability Strategies for Grid C++ Expression Templates". United States. doi:10.2172/1424990. https://www.osti.gov/servlets/purl/1424990.
@article{osti_1424990,
title = {Performance Portability Strategies for Grid C++ Expression Templates},
author = {Boyle, Peter and Lin, Meifing},
abstractNote = {},
doi = {10.2172/1424990},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Jan 10 00:00:00 EST 2017},
month = {Tue Jan 10 00:00:00 EST 2017}
}

Technical Report:

Save / Share:
  • This report outlines the research, development, and support requirements for the Advanced Simulation and Computing (ASC ) Advanced Technology, Development, and Mitigation (ATDM) Performance Portability (a.k.a., Kokkos) project for 2015 - 2019 . The research and development (R&D) goal for Kokkos (v2) has been to create and demonstrate a thread - parallel programming model a nd standard C++ library - based implementation that enables performance portability across diverse manycore architectures such as multicore CPU, Intel Xeon Phi, and NVIDIA Kepler GPU. This R&D goal has been achieved for algorithms that use data parallel pat terns including parallel - for, parallelmore » - reduce, and parallel - scan. Current R&D is focusing on hierarchical parallel patterns such as a directed acyclic graph (DAG) of asynchronous tasks where each task contain s nested data parallel algorithms. This five y ear plan includes R&D required to f ully and performance portably exploit thread parallelism across current and anticipated next generation platforms (NGP). The Kokkos library is being evaluated by many projects exploring algorithm s and code design for NGP. Some production libraries and applications such as Trilinos and LAMMPS have already committed to Kokkos as their foundation for manycore parallelism an d performance portability. These five year requirements includes support required for current and antic ipated ASC projects to be effective and productive in their use of Kokkos on NGP. The greatest risk to the success of Kokkos and ASC projects relying upon Kokkos is a lack of staffing resources to support Kokkos to the degree needed by these ASC projects. This support includes up - to - date tutorials, documentation, multi - platform (hardware and software stack) testing, minor feature enhancements, thread - scalable algorithm consulting, and managing collaborative R&D.« less
  • Modern HPC software packages are rarely self-contained. They depend on a large number of external libraries, and many spend large fractions of their runtime in external subroutines. Performance portability depends not only on the effort of application teams, but also on the availability of well-tuned libraries. At most sites, the burden of maintaining libraries is shared by code teams and facilities. Facilities typically provide well-tuned default versions, but code teams frequently build with bleeding-edge compilers to achieve high performance. For this reason, HPC has no “standard” software stack, unlike other domains where performance is not critical. Incompatibilities among compilers andmore » software versions force application teams and facility staff to re-build custom versions of libraries for each new toolchain. Because the number of potential configurations is combinatorial, and because HPC software is notoriously difficult to port to new machines [3, 7, 8], the tuning effort required to support and maintain performance-portable libraries outstrips the available manpower at most sites. Software complexity is a growing obstacle to performance portability for HPC.« less
  • Performance portability is a phrase often used, but not well understood. The DOE is deploying systems at all of the major facilities across ASCR and ASC that are forcing application developers to confront head-on the challenges of running applications across these diverse systems. With GPU-based systems at the OLCF and LLNL, and Phi-based systems landing at NERSC, ACES (LANL/SNL), and the ALCF – the issue of performance portability is confronting the DOE mission like never before. A new best practice in the DOE is to include “Centers of Excellence” with each major procurement, with a goal of focusing efforts onmore » preparing key applications to be ready for the systems coming to each site, and engaging the vendors directly in a “shared fate” approach to ensuring success. While each COE is necessarily focused on a particular deployment, applications almost invariably must be able to run effectively across the entire DOE HPC ecosystem. This tension between optimizing performance for a particular platform, while still being able to run with acceptable performance wherever the resources are available, is the crux of the challenge we call “performance portability”. This meeting was an opportunity to bring application developers, software providers, and vendors together to discuss this challenge and begin to chart a path forward.« less
  • This milestone is a tri-lab deliverable supporting ongoing Co-Design efforts impacting applications in the Integrated Codes (IC) program element Advanced Technology Development and Mitigation (ATDM) program element. In FY14, the trilabs looked at porting proxy application to technologies of interest for ATS procurements. In FY15, a milestone was completed evaluating proxy applications in multiple programming models and in FY16, a milestone was completed focusing on the migration of lessons learned back into production code development. This year, the co-design milestone focuses on extracting the knowledge gained and/or code revisions back into production applications.