Algorithms for Efficient Reproducible Floating Point Summation

Ahrens, Peter; Demmel, James; Nguyen, Hong Diep

doi:10.1145/3389360

Title: Algorithms for Efficient Reproducible Floating Point Summation

Journal Article · Fri Sep 25 00:00:00 EDT 2020 · ACM Transactions on Mathematical Software

DOI:https://doi.org/10.1145/3389360· OSTI ID:1800924

Ahrens, Peter ^[1]; Demmel, James ^[2]; Nguyen, Hong Diep ^[2]

Massachusetts Institute of Technology, Cambridge, MA, USA
University of California Berkeley, Berkeley, CA, USA

We define “reproducibility” as getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should not affect the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociative floating point addition, makes reproducibility challenging even for summation, or operations like the BLAS. We describe a “reproducible accumulator” data structure (the “binned number”) and associated algorithms to reproducibly sum binary floating point numbers, independent of summation order. We use a subset of the IEEE Floating Point Standard 754-2008 and bitwise operations on the standard representations in memory. Our approach requires only one read-only pass over the data, and one reduction in parallel, using a 6-word reproducible accumulator (more words can be used for higher accuracy), enabling standard tiling optimization techniques. Summingnwords with a 6-word reproducible accumulator requires approximately 9nfloating point operations (arithmetic, comparison, and absolute value) and approximately 3nbitwise operations. The final error bound with a 6-word reproducible accumulator and our default settings can be up to 2²⁹times smaller than the error bound for conventional (recursive) summation on ill-conditioned double-precision inputs.

Cite

Export

Save

Research Organization:: Krell Institute, Ames, IA (United States); Univ. of California, Oakland, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: FG02-97ER25308; SC0008699; SC0008700; SC0010200; AC02-05CH11231

OSTI ID:: 1800924

Journal Information:: ACM Transactions on Mathematical Software, Vol. 46, Issue 3; ISSN 0098-3500

Publisher:: Association for Computing Machinery

Country of Publication:: United States

Language:: English

References (10)

Integer division using reciprocals Alverson, R. [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic https://doi.org/10.1109/ARITH.1991.145558	conference	January 1991
Numerical reproducibility for the parallel reduction on multi- and many-core architectures Collange, Sylvain; Defour, David; Graillat, Stef Parallel Computing, Vol. 49 https://doi.org/10.1016/j.parco.2015.09.001	journal	November 2015
A floating-point technique for extending the available precision Dekker, T. J. Numerische Mathematik, Vol. 18, Issue 3 https://doi.org/10.1007/BF01397083	journal	June 1971
Accurate and Efficient Floating Point Summation Demmel, James; Hida, Yozo SIAM Journal on Scientific Computing, Vol. 25, Issue 4 https://doi.org/10.1137/S1064827502407627	journal	January 2004
Parallel Reproducible Summation Demmel, James; Nguyen, Hong Diep IEEE Transactions on Computers, Vol. 64, Issue 7 https://doi.org/10.1109/TC.2014.2345391	journal	July 2015
The Accuracy of Floating Point Summation Higham, Nicholas J. SIAM Journal on Scientific Computing, Vol. 14, Issue 4 https://doi.org/10.1137/0914050	journal	July 1993
The IEEE Standard 754: One for the History Books Hough, David G. Computer, Vol. 52, Issue 12 https://doi.org/10.1109/MC.2019.2926614	journal	December 2019
Pracniques: further remarks on reducing truncation errors Kahan, W. Communications of the ACM, Vol. 8, Issue 1 https://doi.org/10.1145/363707.363723	journal	January 1965
Ultimately Fast Accurate Summation Rump, Siegfried M. SIAM Journal on Scientific Computing, Vol. 31, Issue 5 https://doi.org/10.1137/080738490	journal	January 2009
Fast high precision summation Rump, Siegfried M.; Ogita, Takeshi; Oishi, Shin'ichi Nonlinear Theory and Its Applications, IEICE, Vol. 1, Issue 1 https://doi.org/10.1587/nolta.1.2	journal	January 2010

Similar Records

Extreme-Scale Algorithms & Software Resilience (EASIR) Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures

Technical Report · Thu Sep 14 00:00:00 EDT 2017 · OSTI ID:1800924

Demmel, James W.

Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications

Journal Article · Mon May 01 00:00:00 EDT 2000 · Journal of Supercomputing · OSTI ID:1800924

He, Yun; Ding, Chris H.Q.