AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

Bohm Agostini, Nicolas; Gibson, Perry; Haris, Jude; Jayaweera, Malith; Rubin, Norm; Tumeo, Antonino; Abellán, José L.; Cano, José; Kaeli, David

doi:10.1109/CGO57630.2024.10444801

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

Conference · Fri Apr 05 04:00:00 EDT 2024

DOI:https://doi.org/10.1109/CGO57630.2024.10444801· OSTI ID:2409170

Bohm Agostini, Nicolas ^[1]; Gibson, Perry ^[2]; Haris, Jude ^[2]; Jayaweera, Malith ^[3]; Rubin, Norm ^[3]; Tumeo, Antonino ^[1]; Abellán, José L. ^[4]; Cano, José ^[2]; Kaeli, David ^[3]

BATTELLE (PACIFIC NW LAB)
University of Glasgow
Northeastern University
Universidad de Murcia

Tensor algebra operations represent an important class of algorithms used across many applications, including machine learning, scientific computing, and data analytics. As a result, the efficient generation of custom accelerators for tensor operations has received increased attention. Previous efforts have produced automated tools enabling users to prototype and explore optimized accelerators. However, little effort has been focused on the host-accelerator interaction in these tools. Efficient use of hardware accelerators requires knowledge about the accelerator's capabilities (operations, data formats, and opcode support), the host CPU microarchitecture (e.g., memory hierarchy), the host-accelerator interface, and the application's features (which code regions should be mapped onto an accelerator). Manually rewriting the original applications to facilitate improved custom accelerator mapping is an error-prone and time-consuming endeavor. To cope with this, we propose AXI4MLIR, a new framework to automatically generate and optimize the communication between the host CPU and arbitrary accelerators that implement linear algebra algorithms. AXI4MLIR extends the MLIR compiler framework to automatically generate efficient host-accelerator driver code for accelerators with AXI-based interfaces. Our compiler extensions enable automatic driver code generation while carefully considering the host's memory hierarchy and target accelerator features. To demonstrate the flexibility and utility of AXI4MLIR, we test it with diverse use cases that include different types of accelerators, tiling scenarios, and dataflow schemes. We compare our experimental results to manual implementations of host-accelerator driver code and find that our approach can reduce CPU cache references by 56% and deliver up to a 1.65x speedup.

Research Organization:: Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 2409170

Report Number(s):: PNNL-SA-184683

Country of Publication:: United States

Language:: English

Similar Records

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operationson Spatial Accelerators

Conference · Mon Oct 18 00:00:00 EDT 2021 · OSTI ID:1972822

A High Performance Sparse Tensor Algebra Compiler in MLIR

Conference · Sun Dec 19 23:00:00 EST 2021 · OSTI ID:1855960

An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration

Conference · Wed Dec 21 23:00:00 EST 2022 · OSTI ID:1909788

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

Citation Formats

Similar Records

Related Subjects