Steps toward fault-tolerant quantum chemistry.

Taube, Andrew Garvin

doi:10.2172/992330

Title: Steps toward fault-tolerant quantum chemistry.

Technical Report · Sat May 01 00:00:00 EDT 2010

DOI:https://doi.org/10.2172/992330· OSTI ID:992330

Taube, Andrew Garvin

Developing quantum chemistry programs on the coming generation of exascale computers will be a difficult task. The programs will need to be fault-tolerant and minimize the use of global operations. This work explores the use a task-based model that uses a data-centric approach to allocate work to different processes as it applies to quantum chemistry. After introducing the key problems that appear when trying to parallelize a complicated quantum chemistry method such as coupled-cluster theory, we discuss the implications of that model as it pertains to the computational kernel of a coupled-cluster program - matrix multiplication. Also, we discuss the extensions that would required to build a full coupled-cluster program using the task-based model. Current programming models for high-performance computing are fault-intolerant and use global operations. Those properties are unsustainable as computers scale to millions of CPUs; instead one must recognize that these systems will be hierarchical in structure, prone to constant faults, and global operations will be infeasible. The FAST-OS HARE project is introducing a scale-free computing model to address these issues. This model is hierarchical and fault-tolerant by design, allows for the clean overlap of computation and communication, reducing the network load, does not require checkpointing, and avoids the complexity of many HPC runtimes. Development of an algorithm within this model requires a change in focus from imperative programming to a data-centric approach. Quantum chemistry (QC) algorithms, in particular electronic structure methods, are an ideal test bed for this computing model. These methods describe the distribution of electrons in a molecule, which determine the properties of the molecule. The computational cost of these methods is high, scaling quartically or higher in the size of the molecule, which is why QC applications are major users of HPC resources. The complexity of these algorithms means that MPI alone is insufficient to achieve parallel scaling; QC developers have been forced to use alternative approaches to achieve scalability and would be receptive to radical shifts in the programming paradigm. Initial work in adapting the simplest QC method, Hartree-Fock, to this the new programming model indicates that the approach is beneficial for QC applications. However, the advantages to being able to scale to exascale computers are greatest for the computationally most expensive algorithms; within QC these are the high-accuracy coupled-cluster (CC) methods. Parallel coupledcluster programs are available, however they are based on the conventional MPI paradigm. Much of the effort is spent handling the complicated data dependencies between the various processors, especially as the size of the problem becomes large. The current paradigm will not survive the move to exascale computers. Here we discuss the initial steps toward designing and implementing a CC method within this model. First, we introduce the general concepts behind a CC method, focusing on the aspects that make these methods difficult to parallelize with conventional techniques. Then we outline what is the computational core of the CC method - a matrix multiply - within the task-based approach that the FAST-OS project is designed to take advantage of. Finally we outline the general setup to implement the simplest CC method in this model, linearized CC doubles (LinCC).

View Technical Report

Cite

Export

Save

Research Organization:: Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 992330

Report Number(s):: SAND2010-3388; TRN: US201022%%413

Country of Publication:: United States

Language:: English

Similar Records

A Fault Oblivious Extreme-Scale Execution Environment

Technical Report · Thu Nov 20 00:00:00 EST 2014 · OSTI ID:992330

McKie, Jim

Proactive Data Containers for Scientific Storage (Final Report)

Technical Report · Tue Dec 10 00:00:00 EST 2019 · OSTI ID:992330

Soumagne, Jerome; Warren, Richard; Mu, Jingqing; +9 more

XPRESS: eXascale PRogramming Environment and System Software

Technical Report · Fri Jul 14 00:00:00 EDT 2017 · OSTI ID:992330

Sterling, Thomas; Candadai, Jayashree; D'Alessandro, Luke; +2 more

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
ALGORITHMS
CHEMISTRY
COMPUTERS
DESIGN
DISTRIBUTION
ELECTRONIC STRUCTURE
ELECTRONS
FOCUSING
KERNELS
PROGRAMMING
RADICALS
Quantum chemistry.
Coupled problems (Complex systems)

Title: Steps toward fault-tolerant quantum chemistry.

Citation Formats

Similar Records

Related Subjects