Towards Optimal Petascale Simulations
Towards Optimal Petascale Simulations Our goal in this project was to design scalable numerical algorithms needed by SciDAC applications that adapt to use evolving hardware resources as efficiently as possible. Our primary challenge is minimizing communication costs, where communication means moving data either between levels of a memory hierarchy (L1 cache to L2 cache to main memory etc.) or between processors over a network. Floating point rates are improving exponentially faster than bandwidth, which is improving exponentially faster than latency. So our goal is to minimize communication. We describe our progress in this area, both for direct and iterative linear algebra. In both areas we have (1) identified lower bounds on the amount of communication (measured both by the number of words moved and the number of messages) required to perform these algorithms, (2) analyzed existing algorithms, which by and large do not attain these lower bounds, and (3) identified or invented new algorithms that do attain them, and evaluated their speedups, which can be quite large.
Enter terms in the toolbar above to search the full text of this document for pages containing specific keywords.