Massively Parallel QCD
Abstract
The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massivelyparallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massivelyparallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.
 Authors:
 Publication Date:
 Research Org.:
 Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
 Sponsoring Org.:
 USDOE
 OSTI Identifier:
 940899
 Report Number(s):
 UCRLJRNL229921
Journal ID: ISSN 00188646; IBMJAE; TRN: US0807241
 DOE Contract Number:
 W7405ENG48
 Resource Type:
 Journal Article
 Resource Relation:
 Journal Name: IBM Journal of Research and Development, vol. 52, no. 1/2, December 11, 2007, pp. 189; Journal Volume: 52; Journal Issue: 1/2
 Country of Publication:
 United States
 Language:
 English
 Subject:
 72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS; 99 GENERAL AND MISCELLANEOUS; ARCHITECTURE; NUCLEAR FORCES; PERFORMANCE; PHYSICS; PROGRAMMING; QUANTUM CHROMODYNAMICS; STRONG INTERACTIONS; SUPERCOMPUTERS
Citation Formats
Soltz, R, Vranas, P, Blumrich, M, Chen, D, Gara, A, Giampap, M, Heidelberger, P, Salapura, V, Sexton, J, and Bhanot, G. Massively Parallel QCD. United States: N. p., 2007.
Web.
Soltz, R, Vranas, P, Blumrich, M, Chen, D, Gara, A, Giampap, M, Heidelberger, P, Salapura, V, Sexton, J, & Bhanot, G. Massively Parallel QCD. United States.
Soltz, R, Vranas, P, Blumrich, M, Chen, D, Gara, A, Giampap, M, Heidelberger, P, Salapura, V, Sexton, J, and Bhanot, G. Wed .
"Massively Parallel QCD". United States.
doi:. https://www.osti.gov/servlets/purl/940899.
@article{osti_940899,
title = {Massively Parallel QCD},
author = {Soltz, R and Vranas, P and Blumrich, M and Chen, D and Gara, A and Giampap, M and Heidelberger, P and Salapura, V and Sexton, J and Bhanot, G},
abstractNote = {The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massivelyparallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massivelyparallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.},
doi = {},
journal = {IBM Journal of Research and Development, vol. 52, no. 1/2, December 11, 2007, pp. 189},
number = 1/2,
volume = 52,
place = {United States},
year = {Wed Apr 11 00:00:00 EDT 2007},
month = {Wed Apr 11 00:00:00 EDT 2007}
}

New massively parallel computer architectures have revolutionized the design of computer algorithms, and promise to have significant influence on algorithms for engineering computations. The traditional global model method has a limited benefit for massively parallel computers. An alternative method is to use the domain decomposition approach. This paper explores the potential for the domain decomposition strategy through actual computations. The example of a threedimensional linear static finite element analysis is presented to the BBN Butterfly TC2000 massively parallel computer with up to 104 processors. The numerical reults indicate that the parallel domain decomposition method requires a lower computation time thanmore »

Massively parallel simulations of Brownian dynamics particle transport in low pressure parallelplate reactors
An understanding of particle transport is necessary to reduce contamination of semiconductor wafers during lowpressure processing. The trajectories of particles in these reactors are determined by external forces (the most important being neutral fluid drag, thermophoresis, electrostatic, viscous ion drag, and gravitational), by Brownian motion (due to neutral and charged gas molecule collisions), and by particle inertia. Gas velocity and temperature fields are also needed for particle transport calculations, but conventional continuum fluid approximations break down at low pressures when the gas mean free path becomes comparable to chamber dimensions. Thus, in this work we use a massively parallel directmore » 
A massively parallel multireference configuration interaction program : the parallel COLUMBUS program.
A massively parallel version of the configuration interaction (CI) section of the COLUMBUS multireference singles and doubles CI (MRCISD) program system is described. In an extension of our previous parallelization work, which was based on message passing, the global array (GA) toolkit has now been used. For each process, these tools permit asynchronous and efficient access to logical blocks of 1 and 2dimensional (2D) arrays physically distributed over the memory of all processors. The GAs are available on most of the major parallel computer systems enabling very convenient portability of our parallel program code. To demonstrate the features of themore » 
Shiftandinvert parallel spectral transformation eigensolver: Massively parallel performance for densityfunctional based tightbinding
The Shiftandinvert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of densityfunctional based tightbinding (DFTB) Hamiltonian and overlap matrices for singlewall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore »Cited by 4