skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Closeout Report for DE-SC0018121

Technical Report ·
DOI:https://doi.org/10.2172/1971643· OSTI ID:1971643
 [1]
  1. Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)

In order to add more capabilities to Halide, we have designed a new framework called Tiramisu and integrated this framework into Halide. Since Tiramisu enables Halide to target heterogeneous architectures, our development efforts have been refocused on Tiramisu. Most high-performance computer systems today are complex and increasingly heterogeneous; they may have CPUs, GPUs and FPGAs. Achieving best performance requires taking full advantage of all these different architectures. To address this issue, we have designed Tiramisu, an optimization framework that enables Halide (and other DSLs) to target heterogeneous architectures. Tiramisu is an optimization framework that takes as input a high level, architecture-independent representation of code and a set of scheduling and data mapping commands that guide code transformation. The input can either be generated by a domain-specific language (DSL) compiler such as Halide or directly written by a programmer. Tiramisu then applies the user-specified code and data-layout transformations and generates an architecture-specific, low-level intermediate representation (IR) that takes advantage of modern architectural features such as multicore parallelism, non-uniform memory (NUMA) hierarchies, clusters, and accelerators like GPUs and FPGAs. We integrated Tiramisu within Halide and implemented a representative set of benchmarks to evaluate this integration. Tiramisu is now open source and is available for public use (http://tiramisu-compiler.org/). A paper about Tiramisu was published, it shows that Tiramisu extends Halide with many new capabilities and that Tiramisu can generate efficient code for multicores, GPUs, FPGAs and distributed heterogeneous systems. The performance of code generated by the Tiramisu backends matches or exceeds hand optimized reference implementations. For example, the multicore backend matches the highly optimized Intel MKL library on many kernels and shows speedups reaching 4x over the original Halide. In addition to making Tiramisu more robust, we have used Tiramisu to implement a set of representative tensor operation for constructing baryon building blocks required for multi baryon contractions in LQCD. In order to implement this code, we needed to generalize Tiramisu in two ways: first we needed to support indirect array accesses, and second, we needed to add support for complex numbers to Tiramisu. The code generated by Tiramisu is 6x faster than the reference code. Our efforts towards an MPI based multi-node version of tiramisu have matured and the resulting code scales well on multiple nodes (tests up to 512 KNL nodes have been undertaken).

Research Organization:
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Nuclear Physics (NP)
DOE Contract Number:
SC0018121
OSTI ID:
1971643
Report Number(s):
DE-SC0018121
Country of Publication:
United States
Language:
English