Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms
Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement of future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.
- Publication Date:
- OSTI Identifier:
- Report Number(s):
- DOE Contract Number:
- Resource Type:
- Technical Report
- Research Org:
- NREL (National Renewable Energy Laboratory (NREL)
- Sponsoring Org:
- USDOE Office of Energy Efficiency and Renewable Energy (EERE)
- Country of Publication:
- United States
- 97 MATHEMATICS AND COMPUTING; benchmarking; Haswell; Peregrine; STREAM; multiply; VASP; Gaussian; LAMMPS; Amber
Enter terms in the toolbar above to search the full text of this document for pages containing specific keywords.