Closeout Report for DE-SC0018121

Detmold, William

doi:10.2172/1971643

Title: Closeout Report for DE-SC0018121

Technical Report · Fri Apr 28 00:00:00 EDT 2023

DOI:https://doi.org/10.2172/1971643· OSTI ID:1971643

Detmold, William ^[1]

Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)

In order to add more capabilities to Halide, we have designed a new framework called Tiramisu and integrated this framework into Halide. Since Tiramisu enables Halide to target heterogeneous architectures, our development efforts have been refocused on Tiramisu. Most high-performance computer systems today are complex and increasingly heterogeneous; they may have CPUs, GPUs and FPGAs. Achieving best performance requires taking full advantage of all these different architectures. To address this issue, we have designed Tiramisu, an optimization framework that enables Halide (and other DSLs) to target heterogeneous architectures. Tiramisu is an optimization framework that takes as input a high level, architecture-independent representation of code and a set of scheduling and data mapping commands that guide code transformation. The input can either be generated by a domain-specific language (DSL) compiler such as Halide or directly written by a programmer. Tiramisu then applies the user-specified code and data-layout transformations and generates an architecture-specific, low-level intermediate representation (IR) that takes advantage of modern architectural features such as multicore parallelism, non-uniform memory (NUMA) hierarchies, clusters, and accelerators like GPUs and FPGAs. We integrated Tiramisu within Halide and implemented a representative set of benchmarks to evaluate this integration. Tiramisu is now open source and is available for public use (http://tiramisu-compiler.org/). A paper about Tiramisu was published, it shows that Tiramisu extends Halide with many new capabilities and that Tiramisu can generate efficient code for multicores, GPUs, FPGAs and distributed heterogeneous systems. The performance of code generated by the Tiramisu backends matches or exceeds hand optimized reference implementations. For example, the multicore backend matches the highly optimized Intel MKL library on many kernels and shows speedups reaching 4x over the original Halide. In addition to making Tiramisu more robust, we have used Tiramisu to implement a set of representative tensor operation for constructing baryon building blocks required for multi baryon contractions in LQCD. In order to implement this code, we needed to generalize Tiramisu in two ways: first we needed to support indirect array accesses, and second, we needed to add support for complex numbers to Tiramisu. The code generated by Tiramisu is 6x faster than the reference code. Our efforts towards an MPI based multi-node version of tiramisu have matured and the resulting code scales well on multiple nodes (tests up to 512 KNL nodes have been undertaken).

View Technical Report

Cite

Export

Save

Research Organization:: Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Nuclear Physics (NP)

DOE Contract Number:: SC0018121

OSTI ID:: 1971643

Report Number(s):: DE-SC0018121

Country of Publication:: United States

Language:: English

Similar Records

OpenACC to FPGA: A Framework for Directive-based High-Performance Reconfigurable Computing

Conference · Sun May 01 00:00:00 EDT 2016 · OSTI ID:1971643

Lee, Seyong; Kim, Jungwon; Vetter, Jeffrey S.

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1971643

Shen, Xipeng

Tensor Contraction and Operation Minimization forExtreme Scale Computational Chemistry

Technical Report · Wed Feb 17 00:00:00 EST 2021 · OSTI ID:1971643

Sabin, Gerald; Sadayappan, P.

Related Subjects

73 NUCLEAR PHYSICS AND RADIATION PHYSICS

Title: Closeout Report for DE-SC0018121

Citation Formats

Similar Records

Related Subjects