Extreme-Scale Algorithms & Software Resilience (EASIR) Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures

Demmel, James W.

doi:10.2172/1395330

Title: Extreme-Scale Algorithms & Software Resilience (EASIR) Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures

Technical Report · Thu Sep 14 00:00:00 EDT 2017

DOI:https://doi.org/10.2172/1395330· OSTI ID:1395330

^[1]

Univ. of California, Berkeley, CA (United States)

This project addresses both communication-avoiding algorithms, and reproducible floating-point computation. Communication, i.e. moving data, either between levels of memory or processors over a network, is much more expensive per operation than arithmetic (measured in time or energy), so we seek algorithms that greatly reduce communication. We developed many new algorithms for both dense and sparse, and both direct and iterative linear algebra, attaining new communication lower bounds, and getting large speedups in many cases. We also extended this work in several ways: (1) We minimize writes separately from reads, since writes may be much more expensive than reads on emerging memory technologies, like Flash, sometimes doing asymptotically fewer writes than reads. (2) We extend the lower bounds and optimal algorithms to arbitrary algorithms that may be expressed as perfectly nested loops accessing arrays, where the array subscripts may be arbitrary affine functions of the loop indices (eg A(i), B(i,j+k, k+3*m-7, …) etc.). (3) We extend our communication-avoiding approach to some machine learning algorithms, such as support vector machines. This work has won a number of awards. We also address reproducible floating-point computation. We define reproducibility to mean getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should ideally not change the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociativity of floating point addition, makes attaining reproducibility a challenge even for simple operations like summing a vector of numbers, or more complicated operations like the Basic Linear Algebra Subprograms (BLAS). We describe an algorithm that computes a reproducible sum of floating point numbers, independent of the order of summation. The algorithm depends only on a subset of the IEEE Floating Point Standard 754-2008, uses just 6 words to represent a “reproducible accumulator,” and requires just one read-only pass over the data, or one reduction in parallel. New instructions based on this work are being considered for inclusion in the future IEEE 754-2018 floating-point standard, and new reproducible BLAS are being considered for the next version of the BLAS standard.

View Technical Report

Cite

Export

Save

Research Organization:: Univ. of California, Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: SC0010200

OSTI ID:: 1395330

Report Number(s):: DOE-BERKELEY-0001

Country of Publication:: United States

Language:: English

Similar Records

Algorithms for Efficient Reproducible Floating Point Summation

Journal Article · Fri Sep 25 00:00:00 EDT 2020 · ACM Transactions on Mathematical Software · OSTI ID:1395330

Ahrens, Peter; Demmel, James; Nguyen, Hong Diep

Resiliency in numerical algorithm design for extreme scale simulations

Journal Article · Fri Dec 10 00:00:00 EST 2021 · International Journal of High Performance Computing Applications · OSTI ID:1395330

Agullo, Emmanuel; Altenbernd, Mirco; Anzt, Hartwig; +33 more

Exploiting data representation for fault tolerance

Journal Article · Tue Jan 06 00:00:00 EST 2015 · Journal of Computational Science · OSTI ID:1395330

Hoemmen, Mark Frederick; Elliott, J.; Mueller, F.

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Extreme-Scale Algorithms & Software Resilience (EASIR) Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures

Citation Formats

Similar Records

Related Subjects