skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing the inner loop of the gravitational force interaction on modern processors

Conference ·
OSTI ID:1043457
 [1]
  1. Los Alamos National Laboratory

We have achieved superior performance on multiple generations of the fastest supercomputers in the world with our hashed oct-tree N-body code (HOT), spanning almost two decades and garnering multiple Gordon Bell Prizes for significant achievement in parallel processing. Execution time for our N-body code is largely influenced by the force calculation in the inner loop. Improvements to the inner loop using SSE3 instructions has enabled the calculation of over 200 million gravitational interactions per second per processor on a 2.6 GHz Opteron, for a computational rate of over 7 Gflops in single precision (700/0 of peak). We obtain optimal performance some processors (including the Cell) by decomposing the reciprocal square root function required for a gravitational interaction into a table lookup, Chebychev polynomial interpolation, and Newton-Raphson iteration, using the algorithm of Karp. By unrolling the loop by a factor of six, and using SPU intrinsics to compute on vectors, we obtain performance of over 16 Gflops on a single Cell SPE. Aggregated over the 8 SPEs on a Cell processor, the overall performance is roughly 130 Gflops. In comparison, the ordinary C version of our inner loop only obtains 1.6 Gflops per SPE with the spuxlc compiler.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC52-06NA25396
OSTI ID:
1043457
Report Number(s):
LA-UR-10-08134; LA-UR-10-8134; TRN: US1203233
Resource Relation:
Conference: The Future of AstroComputing ; December 16, 2010 ; San Diego, CA
Country of Publication:
United States
Language:
English