Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators

Conference ·
 [1];  [2];  [2];  [2];  [3];  [1]
  1. University of Utah
  2. BATTELLE (PACIFIC NW LAB)
  3. Lawrence Livermore National Laboratory
While NVIDIA has been the dominant provider of GPUs for HPC and ML, now AMD has several offerings of GPUs. This encourages programmers to try out AMD GPUs for new codes and also port existing codes over. Unfortunately, without understanding the floating-point differences between these GPU types, software development or porting can introduce bugs—and currently such an understanding is lacking. The magnitude of this open question becomes clear if one imagines the the number of floating-point precision choices (FP16, FP32, etc.), floating-point formats (standard floats, brain-float, etc.), and execution units available (elementary units, matrix/tensor cores, etc.) Questions such as rounding modes and subnormal support are also important. Most of these answers are unknown today or are hard to access. We provide the first testing-guided approach that answers a significant number of these questions. We also devise tests to reveal internal information (e.g., extra bits kept) to make sure that our findings are reliable. Many of our tests employ systematically generated random-programs, others apply fast-math flags and some involve fused multiplyadd. Especially for tensor/matrix cores, the tests have nontrivial logic that we present Our testing approach is reusable for the plethora of GPUs yet to be introduced. Our findings include up to 7 ulps of difference between NVIDIA and AMD for sin and cos at FP32 precision and 3 ulp at FP64. In our study of matrix cores (NVIDIA) and tensor cores (AMD), we have extensively characterized rounding modes (truncation versus round-to-nearest), the number of extra internal bits kept (whether 3 bits are kept or not), subnormal support for inputs and outputs across four different floating-point formats and across NVIDIA A100 and AMD MI250X GPUs. We believe that this wealth of data becoming available for the first time may help avoid significant porting bugs when migrating code across these platforms.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
2539787
Report Number(s):
PNNL-SA-190978
Country of Publication:
United States
Language:
English

Similar Records

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators
Software · Tue Aug 27 20:00:00 EDT 2024 · OSTI ID:code-145775

Optimization and Portability of a Fusion OpenACC-based FORTRAN HPC Code from NVIDIA to AMD GPUs
Conference · Sat Jul 01 00:00:00 EDT 2023 · OSTI ID:2301616

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems
Journal Article · Tue Nov 24 19:00:00 EST 2020 · Proceedings of the Royal Society. A. Mathematical, Physical and Engineering Sciences · OSTI ID:1787013

Related Subjects