skip to main content

Title: Adapting wave-front algorithms to efficiently utilize systems with deep communication hierarchies

Large-scale systems increasingly exhibit a differential between intra-chip and inter-chip communication performance. Processor-cores on the same socket are able to communicate at lower latencies, and with higher bandwidths, than cores on different sockets either within the same node or between nodes. A key challenge is to efficiently use this communication hierarchy and hence optimize performance. We consider here the class of applications that contain wave-front processing. In these applications data can only be processed after their upstream neighbors have been processed. Similar dependencies result between processors in which communication is required to pass boundary data downstream and whose cost is typically impacted by the slowest communication channel in use. In this work we develop a novel hierarchical wave-front approach that reduces the use of slower communications in the hierarchy but at the cost of additional computation and higher use of on-chip communications. This tradeoff is explored using a performance model and an implementation on the Petascale Roadrunner system demonstrates a 27% performance improvement at full system-scale on a kernel application. The approach is generally applicable to large-scale multi-core and accelerated systems where a differential in system communication performance exists.
Authors:
 [1] ;  [1] ;  [1]
  1. Los Alamos National Laboratory
Publication Date:
OSTI Identifier:
971329
Report Number(s):
LA-UR-09-06298; LA-UR-09-6298
TRN: US201004%%91
DOE Contract Number:
AC52-06NA25396
Resource Type:
Conference
Resource Relation:
Conference: IEEE International Parallel & Distributed Processing Symposium ; April 19, 2010 ; Atlanta, GA
Research Org:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
97; ALGORITHMS; COMMUNICATIONS; IMPLEMENTATION; KERNELS; PERFORMANCE; PROCESSING