HPC application performance and scaling : understanding trends and future challenges with application benchmarks on past, present and future Tri-Lab computing systems.
In this paper HPC architectural characteristics and their impact on application performance and scaling are investigated. Performance data gathered over several generations of very large HPC systems like: ASC Red Storm, ASC Purple, and a large InfiniBand cluster - Red Sky, are analyzed. As the number of cache coherent cores and number of NUMA domains at a compute node keeps increasing, we analyze their impact with a few simple benchmarks and several applications. We present bottlenecks and remedies examining production applications. We conclude with preliminary early-hardware performance data from the ASC Cielo, a petaFLOPS class future capability system.