skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Programming Model for Massive Data Parallelism with Data Dependencies

Abstract

Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains.

Authors:
 [1];  [2];  [1];  [1]
  1. ORNL
  2. North Carolina State University
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Laboratory Directed Research and Development (LDRD) Program; Work for Others (WFO)
OSTI Identifier:
964332
DOE Contract Number:
DE-AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: Parallel Architectures and Compilation Techniques (PACT), Raleigh, NC, USA, 20090912, 20090912
Country of Publication:
United States
Language:
English
Subject:
43 PARTICLE ACCELERATORS; ACCELERATORS; ARCHITECTURE; BENCHMARKS; CAPACITY; MANAGEMENT; PERFORMANCE; PROGRAMMING

Citation Formats

Cui, Xiaohui, Mueller, Frank, Potok, Thomas E, and Zhang, Yongpeng. A Programming Model for Massive Data Parallelism with Data Dependencies. United States: N. p., 2009. Web.
Cui, Xiaohui, Mueller, Frank, Potok, Thomas E, & Zhang, Yongpeng. A Programming Model for Massive Data Parallelism with Data Dependencies. United States.
Cui, Xiaohui, Mueller, Frank, Potok, Thomas E, and Zhang, Yongpeng. Thu . "A Programming Model for Massive Data Parallelism with Data Dependencies". United States. doi:.
@article{osti_964332,
title = {A Programming Model for Massive Data Parallelism with Data Dependencies},
author = {Cui, Xiaohui and Mueller, Frank and Potok, Thomas E and Zhang, Yongpeng},
abstractNote = {Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Jan 01 00:00:00 EST 2009},
month = {Thu Jan 01 00:00:00 EST 2009}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Abstract not provided.
  • The talk will focus on recent research on parallelizing a mixed integer programming branch-and-cut solver. This work is guided by the desire to develop a general purpose parallel branch-and-cut code capable of solving difficult real-world instances that currently available commercial codes are unable to solve. The preliminary implementation of the code is carried out on a network of 8 DEC 5000 workstations, using TreadMarks as the supporting software distributed shared memory system. Features in the mixed integer programming components as well as the parallel implementation will be presented. The talk will conclude with numerical results on MIPLIB test problems.
  • A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation ofmore » the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.« less
  • As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domainmore » specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.« less