DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performance Models for the Spike Banded Linear System Solver

Abstract

With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners, compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilitiesmore » of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less

Authors:
 [1];  [2];  [2];  [2]
  1. Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
  2. Department of Computer Science, Purdue University, West Lafayette, IN, USA
Publication Date:
Sponsoring Org.:
USDOE
OSTI Identifier:
1243136
Grant/Contract Number:  
FC52-08NA28617
Resource Type:
Published Article
Journal Name:
Scientific Programming
Additional Journal Information:
Journal Name: Scientific Programming Journal Volume: 19 Journal Issue: 1; Journal ID: ISSN 1058-9244
Publisher:
Hindawi Publishing Corporation
Country of Publication:
Egypt
Language:
English

Citation Formats

Manguoglu, Murat, Saied, Faisal, Sameh, Ahmed, and Grama, Ananth. Performance Models for the Spike Banded Linear System Solver. Egypt: N. p., 2011. Web. doi:10.1155/2011/426421.
Manguoglu, Murat, Saied, Faisal, Sameh, Ahmed, & Grama, Ananth. Performance Models for the Spike Banded Linear System Solver. Egypt. https://doi.org/10.1155/2011/426421
Manguoglu, Murat, Saied, Faisal, Sameh, Ahmed, and Grama, Ananth. Sat . "Performance Models for the Spike Banded Linear System Solver". Egypt. https://doi.org/10.1155/2011/426421.
@article{osti_1243136,
title = {Performance Models for the Spike Banded Linear System Solver},
author = {Manguoglu, Murat and Saied, Faisal and Sameh, Ahmed and Grama, Ananth},
abstractNote = {With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners, compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.},
doi = {10.1155/2011/426421},
journal = {Scientific Programming},
number = 1,
volume = 19,
place = {Egypt},
year = {Sat Jan 01 00:00:00 EST 2011},
month = {Sat Jan 01 00:00:00 EST 2011}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1155/2011/426421

Citation Metrics:
Cited by: 4 works
Citation information provided by
Web of Science

Save / Share: