Optimizing the inner loop of the gravitational force interaction on modern processors
- Los Alamos National Laboratory
We have achieved superior performance on multiple generations of the fastest supercomputers in the world with our hashed oct-tree N-body code (HOT), spanning almost two decades and garnering multiple Gordon Bell Prizes for significant achievement in parallel processing. Execution time for our N-body code is largely influenced by the force calculation in the inner loop. Improvements to the inner loop using SSE3 instructions has enabled the calculation of over 200 million gravitational interactions per second per processor on a 2.6 GHz Opteron, for a computational rate of over 7 Gflops in single precision (700/0 of peak). We obtain optimal performance some processors (including the Cell) by decomposing the reciprocal square root function required for a gravitational interaction into a table lookup, Chebychev polynomial interpolation, and Newton-Raphson iteration, using the algorithm of Karp. By unrolling the loop by a factor of six, and using SPU intrinsics to compute on vectors, we obtain performance of over 16 Gflops on a single Cell SPE. Aggregated over the 8 SPEs on a Cell processor, the overall performance is roughly 130 Gflops. In comparison, the ordinary C version of our inner loop only obtains 1.6 Gflops per SPE with the spuxlc compiler.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC52-06NA25396
- OSTI ID:
- 1043457
- Report Number(s):
- LA-UR-10-08134; LA-UR-10-8134; TRN: US1203233
- Resource Relation:
- Conference: The Future of AstroComputing ; December 16, 2010 ; San Diego, CA
- Country of Publication:
- United States
- Language:
- English
Similar Records
The Impact of IBM Cell Technology on the Programming Paradigm in the Context of Computer Systems for Climate and Weather Models
369 TFlop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer
Related Subjects
GENERAL PHYSICS
72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
ACCURACY
ALGORITHMS
COMPUTER CODES
GRAVITATIONAL INTERACTIONS
INTERACTIONS
INTERPOLATION
PARALLEL PROCESSING
PERFORMANCE
POLYNOMIALS
ROOTS
SOLAR PROTONS
SUPERCOMPUTERS
VECTORS