skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Quantum Monte Carlo Endstation for Petascale Computing

Abstract

The major achievements enabled by QMC Endstation grant include * Performance improvement on clusters of x86 multi-core systems, especially on Cray XT systems * New and improved methods for the wavefunction optimizations * New forms of trial wavefunctions * Implementation of the full application on NVIDIA GPUs using CUDA The scaling studies of QMCPACK on large-scale systems show excellent parallel efficiency up to 216K cores on Jaguarpf (Cray XT5). The GPU implementation shows speedups of 10-15x over the CPU implementation on older generation of x86. We have implemented hybrid OpenMP/MPI scheme in QMC to take advantage of multi-core shared memory processors of petascale systems. Our hybrid scheme has several advantages over the standard MPI-only scheme. * Memory optimized: large read-only data to store one-body orbitals and other shared properties to represent the trial wave function and many-body Hamiltonian can be shared among threads, which reduces the memory footprint of a large-scale problem. * Cache optimized: the data associated with an active Walker are in cache during the compute-intensive drift-diffusion process and the operations on an Walker are optimized for cache reuse. Thread-local objects are used to ensure the data affinity to a thread. * Load balanced: Walkers in an ensemblemore » are evenly distributed among threads and MPI tasks. The two-level parallelism reduces the population imbalance among MPI tasks and reduces the number of point-to-point communications of large messages (serialized objects) for the Walker exchange. * Communication optimized: the communication overhead, especially for the collective operations necessary to determine ET and measure the properties of an ensemble, is significantly lowered by using less MPI tasks. The multiple forms of parallelism afforded by QMC algorithms make them ideal candidates for acceleration in the many-core paradigm. We presented the results of our effort to port the QMCPACK simulation code to the NVIDIA CUDA GPU platform. We restructured the CPU algorithms to express additional parallelism, minimize GPU-CPU communication, and efficiently utilize the GPU memory hierarchy. Using mixed precision on GT200 GPUs and MPI for intercommunication and load balancing, we observe typical full-application speedups of approximately 10x to 15x relative to quad-core Xeon CPUs alone, while reproducing the double-precision CPU results within statistical error. We developed an all-electron quantum Monte Carlo (QMC) method for solids that does not rely on pseudopotentials, and used it to construct a primary ultra-high-pressure calibration based on the equation of state of cubic boron nitride. We computed the static contribution to the free energy with the QMC method and obtained the phonon contribution from density functional theory, yielding a high-accuracy calibration up to 900 GPa usable directly in experiment. We computed the anharmonic Raman frequency shift with QMC simulations as a function of pressure and temperature, allowing optical pressure calibration. In contrast to present experimental approaches, small systematic errors in the theoretical EOS do not increase with pressure, and no extrapolation is needed. This all-electron method is applicable to first-row solids, providing a new reference for ab initio calculations of solids and benchmarks for pseudopotential accuracy. We compared experimental and theoretical results on the momentum distribution and the quasiparticle renormalization factor in sodium. From an x-ray Compton-profile measurement of the valence-electron momentum density, we derived its discontinuity at the Fermi wavevector finding an accurate measure of the renormalization factor that we compared with quantum-Monte-Carlo and G0W0 calculations performed both on crystalline sodium and on the homogeneous electron gas. Our calculated results are in good agreement with the experiment. We have been studying the heat of formation for various Kubas complexes of molecular hydrogen on Ti(1,2)ethylene-nH2 using Diffusion Monte Carlo. This work has been started and is ongoing. We are studying systems involving 1 and 2 Ti bonding sites with up to 10 hydrogen molecules in numerous configurations. This work will establish a benchmark that will test the accuracy of density functional calculations and establish the feasibility of our methods for similar systems.« less

Authors:
Publication Date:
Research Org.:
University of Illinois Urbana-Champaign
Sponsoring Org.:
USDOE Office of Science and Technology (EM-50)
OSTI Identifier:
1007216
Report Number(s):
DOE/OR23335-F
TRN: US201108%%106
DOE Contract Number:  
FG05-08OR23335
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; ACCELERATION; AFFINITY; ALGORITHMS; BENCHMARKS; BONDING; BORON NITRIDES; CALIBRATION; DIFFUSION; ELECTRON GAS; EXTRAPOLATION; FORMATION HEAT; FREE ENERGY; FUNCTIONALS; HAMILTONIANS; HYDROGEN; IMPLEMENTATION; PHONONS; RENORMALIZATION; SODIUM; WAVE FUNCTIONS; Quantum Monte Carlo; high performance computing; endstation; Graphics Processing Units, message passing interface; compton scattering, sodium; benzene

Citation Formats

Ceperley, David. Quantum Monte Carlo Endstation for Petascale Computing. United States: N. p., 2011. Web. doi:10.2172/1007216.
Ceperley, David. Quantum Monte Carlo Endstation for Petascale Computing. United States. doi:10.2172/1007216.
Ceperley, David. Wed . "Quantum Monte Carlo Endstation for Petascale Computing". United States. doi:10.2172/1007216. https://www.osti.gov/servlets/purl/1007216.
@article{osti_1007216,
title = {Quantum Monte Carlo Endstation for Petascale Computing},
author = {Ceperley, David},
abstractNote = {The major achievements enabled by QMC Endstation grant include * Performance improvement on clusters of x86 multi-core systems, especially on Cray XT systems * New and improved methods for the wavefunction optimizations * New forms of trial wavefunctions * Implementation of the full application on NVIDIA GPUs using CUDA The scaling studies of QMCPACK on large-scale systems show excellent parallel efficiency up to 216K cores on Jaguarpf (Cray XT5). The GPU implementation shows speedups of 10-15x over the CPU implementation on older generation of x86. We have implemented hybrid OpenMP/MPI scheme in QMC to take advantage of multi-core shared memory processors of petascale systems. Our hybrid scheme has several advantages over the standard MPI-only scheme. * Memory optimized: large read-only data to store one-body orbitals and other shared properties to represent the trial wave function and many-body Hamiltonian can be shared among threads, which reduces the memory footprint of a large-scale problem. * Cache optimized: the data associated with an active Walker are in cache during the compute-intensive drift-diffusion process and the operations on an Walker are optimized for cache reuse. Thread-local objects are used to ensure the data affinity to a thread. * Load balanced: Walkers in an ensemble are evenly distributed among threads and MPI tasks. The two-level parallelism reduces the population imbalance among MPI tasks and reduces the number of point-to-point communications of large messages (serialized objects) for the Walker exchange. * Communication optimized: the communication overhead, especially for the collective operations necessary to determine ET and measure the properties of an ensemble, is significantly lowered by using less MPI tasks. The multiple forms of parallelism afforded by QMC algorithms make them ideal candidates for acceleration in the many-core paradigm. We presented the results of our effort to port the QMCPACK simulation code to the NVIDIA CUDA GPU platform. We restructured the CPU algorithms to express additional parallelism, minimize GPU-CPU communication, and efficiently utilize the GPU memory hierarchy. Using mixed precision on GT200 GPUs and MPI for intercommunication and load balancing, we observe typical full-application speedups of approximately 10x to 15x relative to quad-core Xeon CPUs alone, while reproducing the double-precision CPU results within statistical error. We developed an all-electron quantum Monte Carlo (QMC) method for solids that does not rely on pseudopotentials, and used it to construct a primary ultra-high-pressure calibration based on the equation of state of cubic boron nitride. We computed the static contribution to the free energy with the QMC method and obtained the phonon contribution from density functional theory, yielding a high-accuracy calibration up to 900 GPa usable directly in experiment. We computed the anharmonic Raman frequency shift with QMC simulations as a function of pressure and temperature, allowing optical pressure calibration. In contrast to present experimental approaches, small systematic errors in the theoretical EOS do not increase with pressure, and no extrapolation is needed. This all-electron method is applicable to first-row solids, providing a new reference for ab initio calculations of solids and benchmarks for pseudopotential accuracy. We compared experimental and theoretical results on the momentum distribution and the quasiparticle renormalization factor in sodium. From an x-ray Compton-profile measurement of the valence-electron momentum density, we derived its discontinuity at the Fermi wavevector finding an accurate measure of the renormalization factor that we compared with quantum-Monte-Carlo and G0W0 calculations performed both on crystalline sodium and on the homogeneous electron gas. Our calculated results are in good agreement with the experiment. We have been studying the heat of formation for various Kubas complexes of molecular hydrogen on Ti(1,2)ethylene-nH2 using Diffusion Monte Carlo. This work has been started and is ongoing. We are studying systems involving 1 and 2 Ti bonding sites with up to 10 hydrogen molecules in numerous configurations. This work will establish a benchmark that will test the accuracy of density functional calculations and establish the feasibility of our methods for similar systems.},
doi = {10.2172/1007216},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2011},
month = {3}
}