skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Hierarchical Krylov and nested Krylov methods for extreme-scale computing

Abstract

The solution of large, sparse linear systems is typically a dominant phase of computation for simulations based on partial differential equations, which are ubiquitous in scientific and engineering applications. While preconditioned Krylov methods are widely used and provide many advantages for solving sparse linear systems that do not have highly convergent, geometric multigrid solvers or specialized fast solvers, Krylov methods encounter well-known scaling difficulties for over 10,000 processor cores because each iteration requires at least one vector inner product, which in turn requires a global synchronization that scales poorly because of internode latency. To aid in overcoming these difficulties, we have developed hierarchical Krylov methods and nested Krylov methods in the PETSc library that reduce the number of global inner products required across the entire system (where they are expensive), though freely allow vector inner products across smaller subsets of the entire system (where they are inexpensive) or use inner iterations that do not invoke vector inner products at all. Nested Krylov methods are a generalization of inner-outer iterative methods with two or more layers. Hierarchical Krylov methods are a generalization of block Jacobi and overlapping additive Schwarz methods, where each block itself is solved by Krylov methods on smallermore » blocks. Conceptually, the hierarchy can continue recursively to an arbitrary number of levels of smaller and smaller blocks. As a specific case, we introduce the hierarchical FGMRES method, or h-FGMRES, and we demonstrate the impact of two-level h-FGMRES with a variable preconditioner on the PFLOTRAN subsurface flow application. We also demonstrate the impact of nested FGMRES, BiCGStab and Chebyshev methods. These hierarchical Krylov methods and nested Krylov methods significantly reduced overall PFLOTRAN simulation time on the Cray XK6 when using 10,000 through 224,000 cores through the combined effects of reduced global synchronization due to fewer global inner products and stronger inner hierarchical or nested preconditioners.« less

Authors:
 [1];  [1];  [2];  [3]
  1. Argonne National Lab. (ANL), Argonne, IL (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States); Illinois Inst. of Technology, Chicago, IL (United States)
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Tennessee, Knoxville, TN (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23). Climate and Environmental Sciences Division
OSTI Identifier:
1565143
Grant/Contract Number:  
AC02-05CH11231; AC02-06CH11357; AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Parallel Computing
Additional Journal Information:
Journal Volume: 40; Journal Issue: 1; Journal ID: ISSN 0167-8191
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Hierarchical; Nested; Krylov methods; Variable preconditioner

Citation Formats

McInnes, Lois Curfman, Smith, Barry, Zhang, Hong, and Mills, Richard Tran. Hierarchical Krylov and nested Krylov methods for extreme-scale computing. United States: N. p., 2013. Web. doi:10.1016/j.parco.2013.10.001.
McInnes, Lois Curfman, Smith, Barry, Zhang, Hong, & Mills, Richard Tran. Hierarchical Krylov and nested Krylov methods for extreme-scale computing. United States. doi:10.1016/j.parco.2013.10.001.
McInnes, Lois Curfman, Smith, Barry, Zhang, Hong, and Mills, Richard Tran. Thu . "Hierarchical Krylov and nested Krylov methods for extreme-scale computing". United States. doi:10.1016/j.parco.2013.10.001. https://www.osti.gov/servlets/purl/1565143.
@article{osti_1565143,
title = {Hierarchical Krylov and nested Krylov methods for extreme-scale computing},
author = {McInnes, Lois Curfman and Smith, Barry and Zhang, Hong and Mills, Richard Tran},
abstractNote = {The solution of large, sparse linear systems is typically a dominant phase of computation for simulations based on partial differential equations, which are ubiquitous in scientific and engineering applications. While preconditioned Krylov methods are widely used and provide many advantages for solving sparse linear systems that do not have highly convergent, geometric multigrid solvers or specialized fast solvers, Krylov methods encounter well-known scaling difficulties for over 10,000 processor cores because each iteration requires at least one vector inner product, which in turn requires a global synchronization that scales poorly because of internode latency. To aid in overcoming these difficulties, we have developed hierarchical Krylov methods and nested Krylov methods in the PETSc library that reduce the number of global inner products required across the entire system (where they are expensive), though freely allow vector inner products across smaller subsets of the entire system (where they are inexpensive) or use inner iterations that do not invoke vector inner products at all. Nested Krylov methods are a generalization of inner-outer iterative methods with two or more layers. Hierarchical Krylov methods are a generalization of block Jacobi and overlapping additive Schwarz methods, where each block itself is solved by Krylov methods on smaller blocks. Conceptually, the hierarchy can continue recursively to an arbitrary number of levels of smaller and smaller blocks. As a specific case, we introduce the hierarchical FGMRES method, or h-FGMRES, and we demonstrate the impact of two-level h-FGMRES with a variable preconditioner on the PFLOTRAN subsurface flow application. We also demonstrate the impact of nested FGMRES, BiCGStab and Chebyshev methods. These hierarchical Krylov methods and nested Krylov methods significantly reduced overall PFLOTRAN simulation time on the Cray XK6 when using 10,000 through 224,000 cores through the combined effects of reduced global synchronization due to fewer global inner products and stronger inner hierarchical or nested preconditioners.},
doi = {10.1016/j.parco.2013.10.001},
journal = {Parallel Computing},
number = 1,
volume = 40,
place = {United States},
year = {2013},
month = {10}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 15 works
Citation information provided by
Web of Science

Save / Share: