PetaScale calculations of the electronic structures ofnanostructures with hundreds of thousands of processors
Abstract
Density functional theory (DFT) is the most widely used ab initio method in material simulations. It accounts for 75% of the NERSC allocation time in the material science category. The DFT can be used to calculate the electronic structure, the charge density, the total energy and the atomic forces of a material system. With the advance of the HPC power and new algorithms, DFT can now be used to study thousand atom systems in some limited ways (e.g, a single selfconsistent calculation without atomic relaxation). But there are many problems which either requires much larger systems (e.g, >100,000 atoms), or many total energy calculation steps (e.g. for molecular dynamics or atomic relaxations). Examples include: grain boundary, dislocation energies and atomic structures, impurity transport and clustering in semiconductors, nanostructure growth, electronic structures of nanostructures and their internal electric fields. Due to the O(N{sup 3}) scaling of the conventional DFT algorithms (as implemented in codes like Qbox, Paratec, Petots), these problems are beyond the reach even for petascale computers. As the proposed petascale computers might have millions of processors, new computational paradigms and algorithms are needed to solve the above large scale problems. In particular, O(N) scaling algorithms with parallelization capability upmore »
 Authors:
 Publication Date:
 Research Org.:
 Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
 Sponsoring Org.:
 USDOE Director. Office of Science. Advanced ScientificComputing Research
 OSTI Identifier:
 929688
 Report Number(s):
 LBNL63793
R&D Project: KX1310; BnR: KJ0102000; TRN: US0806640
 DOE Contract Number:
 DEAC0205CH11231
 Resource Type:
 Technical Report
 Country of Publication:
 United States
 Language:
 English
 Subject:
 75; ALGORITHMS; CHARGE DENSITY; ELECTRIC FIELDS; ELECTRONIC STRUCTURE; NANOSTRUCTURES; SCALING
Citation Formats
Wang, LinWang, Zhao, Zhengji, and Meza, Juan. PetaScale calculations of the electronic structures ofnanostructures with hundreds of thousands of processors. United States: N. p., 2006.
Web. doi:10.2172/929688.
Wang, LinWang, Zhao, Zhengji, & Meza, Juan. PetaScale calculations of the electronic structures ofnanostructures with hundreds of thousands of processors. United States. doi:10.2172/929688.
Wang, LinWang, Zhao, Zhengji, and Meza, Juan. Sat .
"PetaScale calculations of the electronic structures ofnanostructures with hundreds of thousands of processors". United States.
doi:10.2172/929688. https://www.osti.gov/servlets/purl/929688.
@article{osti_929688,
title = {PetaScale calculations of the electronic structures ofnanostructures with hundreds of thousands of processors},
author = {Wang, LinWang and Zhao, Zhengji and Meza, Juan},
abstractNote = {Density functional theory (DFT) is the most widely used ab initio method in material simulations. It accounts for 75% of the NERSC allocation time in the material science category. The DFT can be used to calculate the electronic structure, the charge density, the total energy and the atomic forces of a material system. With the advance of the HPC power and new algorithms, DFT can now be used to study thousand atom systems in some limited ways (e.g, a single selfconsistent calculation without atomic relaxation). But there are many problems which either requires much larger systems (e.g, >100,000 atoms), or many total energy calculation steps (e.g. for molecular dynamics or atomic relaxations). Examples include: grain boundary, dislocation energies and atomic structures, impurity transport and clustering in semiconductors, nanostructure growth, electronic structures of nanostructures and their internal electric fields. Due to the O(N{sup 3}) scaling of the conventional DFT algorithms (as implemented in codes like Qbox, Paratec, Petots), these problems are beyond the reach even for petascale computers. As the proposed petascale computers might have millions of processors, new computational paradigms and algorithms are needed to solve the above large scale problems. In particular, O(N) scaling algorithms with parallelization capability up to millions of processors are needed. For a large material science problem, a natural approach to achieve this goal is by divideandconquer method: to spatially divide the system into many small pieces, and solve each piece by a small local group of processors. This solves the O(N) scaling and the parallelization problem at the same time. However, the challenge of this approach is for how to divide the system into small pieces and how to patch them up without the trace of the spatial division. Here, we present a linear scaling 3 dimensional fragment (LS3DF) method which uses a novel divisionpatching scheme that cancels out the artificial boundary effects of the spatial division. As a result, the LS3DF results are essential the same as the original full system DFT results (with the difference smaller than chemical accuracy and smaller than other numerical uncertainties, e.g, due to numerical grids), while with a required floating point operation thousands of times smaller, and computational time thousands of times shorter, than the conventional DFT method. For example, using a few thousand processors, the LS3DF can calculate a >10,000 atom system within an hour while the conventional method might take more than a month to finish. The LS3DF method is applicable to insulator and semiconductor systems, it covers a current gap in DOE's material science code portfolio for ab initio ultrascale simulation. We will use it here to solve the internal electric field problems for composite nanostructures.},
doi = {10.2172/929688},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sat Apr 01 00:00:00 EST 2006},
month = {Sat Apr 01 00:00:00 EST 2006}
}

In this report we summarize research into new parallel algebraic multigrid (AMG) methods. We first provide a introduction to parallel AMG. We then discuss our research in parallel AMG algorithms for very large scale platforms. We detail significant improvements in the AMG setup phase to a matrixmatrix multiplication kernel. We present a smoothed aggregation AMG algorithm with fewer communication synchronization points, and discuss its links to domain decomposition methods. Finally, we discuss a multigrid smoothing technique that utilizes two message passing layers for use on multicore processors.

Dynamic stability calculations using vector and array processors. Final report
The newest generation of computers uses parallelism to enhance processing rates. This new technology makes possible speed improvements of a factor of 10 to 50 over even the fastest of the serial computers. However, these high processing rates are achieved only when the full capacity of the computer to perform operations in parallel is utilized. EPRI research project RP670 includes several tasks that investigate the degree to which the computations of dynamic stability analysis can be performed in parallel. Both transient stability and small signal stability are considered. The transient stability problem is studied in significantly greater detail than themore » 
Power flow calculations utilizing array processors. Final report
Array processors can add substantially to the computation speed of lowcost supermini computers. In tests using a Bonneville Power Administration code, the devices showed promise in the solution portion of system power flow calculations but also displayed operating characteristics that might offset that advantage. 
Lightweight and Statistical Techniques for Petascale Debugging: Correctness on Petascale Systems (CoPS) Preliminry Report
Petascale platforms with O(10{sup 5}) and O(10{sup 6}) processing cores are driving advancements in a wide range of scientific disciplines. These large systems create unprecedented application development challenges. Scalable correctness tools are critical to shorten the timetosolution on these systems. Currently, many DOE application developers use primitive manual debugging based on printf or traditional debuggers such as TotalView or DDT. This paradigm breaks down beyond a few thousand cores, yet bugs often arise above that scale. Programmers must reproduce problems in smaller runs to analyze them with traditional tools, or else perform repeated runs at scale using only primitive techniques.more » 
Lightweight and Statistical Techniques for Petascale PetaScale Debugging
This project investigated novel techniques for debugging scientific applications on petascale architectures. In particular, we developed lightweight tools that narrow the problem space when bugs are encountered. We also developed techniques that either limit the number of tasks and the code regions to which a developer must apply a traditional debugger or that apply statistical techniques to provide direct suggestions of the location and type of error. We extend previous work on the Stack Trace Analysis Tool (STAT), that has already demonstrated scalability to over one hundred thousand MPI tasks. We also extended statistical techniques developed to isolate programming errorsmore »