A Distributed OpenCL Framework using Redundant Computation and Data Replication
Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D's remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined in a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.
- Publication Date:
- OSTI Identifier:
- DOE Contract Number:
- Resource Type:
- Resource Relation:
- Conference: ACM SIGPLAN Conference on Programming Language Design and Implementation, Santa Barbara, CA, USA, 20160613, 20160617
- Research Org:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org:
- Country of Publication:
- United States
Enter terms in the toolbar above to search the full text of this document for pages containing specific keywords.