Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors
The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.
- Research Organization:
- Thomas Jefferson National Accelerator Facility (TJNAF), Newport News, VA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- AC05-06OR23177
- OSTI ID:
- 1169292
- Report Number(s):
- JLAB-IT-14-03; DOE/OR/23177-3298; arXiv:1412.2629
- Resource Relation:
- Conference: Supercomputing 2014, November 16-21, 2014, New Orleans,LA
- Country of Publication:
- United States
- Language:
- English
Similar Records
Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel® Xeon Phi™ Processor
QCD For Intel(R) Xeon Phi(tm) and Xeon(tm) processors