Latency, bandwidth, and concurrent issue limitations in high-performance CFD.
To achieve high performance, a parallel algorithm needs to effectively utilize the memory subsystem and minimize the communication volume and the number of network transactions. These issues gain further importance on modern architectures, where the peak CPU performance is increasing much more rapidly than the memory or network performance. In this paper, we present some performance enhancing techniques that were employed on an unstructured mesh implicit solver. Our experimental results show that this solver adapts reasonably well to the high memory and network latencies.
- Research Organization:
- Argonne National Lab., IL (US)
- Sponsoring Organization:
- US Department of Energy (US)
- DOE Contract Number:
- W-31109-ENG-38
- OSTI ID:
- 768614
- Report Number(s):
- ANL/MCS/CP-103358; TRN: US200223%%640
- Resource Relation:
- Conference: First M.I.T. Conference on Computational Fluid and Solid Mechanics, Cambridge, MA (US), 06/12/2001--06/14/2001; Other Information: PBD: 10 Nov 2000; PBD: 10 Nov 2000
- Country of Publication:
- United States
- Language:
- English
Similar Records
Scientific Application Requirements for Leadership Computing at the Exascale
Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor
Parallel radiation transport algorithms and associated architectural requirements
Technical Report
·
Sat Dec 01 00:00:00 EST 2007
·
OSTI ID:768614
+8 more
Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor
Conference
·
Tue May 23 00:00:00 EDT 2017
·
OSTI ID:768614
+1 more
Parallel radiation transport algorithms and associated architectural requirements
Conference
·
Thu Jan 01 00:00:00 EST 2004
·
OSTI ID:768614