skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Data parallelism

Abstract

Data locality is fundamental to performance on distributed memory parallel architectures. Application programmers know this well and go to great pains to arrange data for optimal performance. Data Parallelism, a model from the Single Instruction Multiple Data (SIMD) architecture, is finding a new home on the Multiple Instruction Multiple Data (MIMD) architectures. This style of programming, distinguished by taking the computation to the data, is what programmers have been doing by hand for a long time. Recent work in this area holds the promise of making the programmer's task easier.

Authors:
Publication Date:
Research Org.:
Lawrence Livermore National Lab., CA (United States)
Sponsoring Org.:
DOE; USDOE, Washington, DC (United States)
OSTI Identifier:
6640647
Alternate Identifier(s):
OSTI ID: 6640647; Legacy ID: DE93009300
Report Number(s):
UCRL-JC-111827; CONF-930117--6
ON: DE93009300
DOE Contract Number:
W-7405-ENG-48
Resource Type:
Conference
Resource Relation:
Conference: 26. Hawaiian international conference on system science: biotechnology computing track, Kauai, HI (United States), 5-8 Jan 1993
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ARRAY PROCESSORS; MEMORY MANAGEMENT; DISTRIBUTED DATA PROCESSING; PARALLEL PROCESSING; PERFORMANCE; DATA PROCESSING; PROCESSING; PROGRAMMING 990200* -- Mathematics & Computers

Citation Formats

Gorda, B.C. Data parallelism. United States: N. p., 1992. Web.
Gorda, B.C. Data parallelism. United States.
Gorda, B.C. Tue . "Data parallelism". United States. doi:.
@article{osti_6640647,
title = {Data parallelism},
author = {Gorda, B.C.},
abstractNote = {Data locality is fundamental to performance on distributed memory parallel architectures. Application programmers know this well and go to great pains to arrange data for optimal performance. Data Parallelism, a model from the Single Instruction Multiple Data (SIMD) architecture, is finding a new home on the Multiple Instruction Multiple Data (MIMD) architectures. This style of programming, distinguished by taking the computation to the data, is what programmers have been doing by hand for a long time. Recent work in this area holds the promise of making the programmer's task easier.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Sep 01 00:00:00 EDT 1992},
month = {Tue Sep 01 00:00:00 EDT 1992}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity ofmore » GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains.« less
  • The increased computational power of massively parallel computers and high bandwidth low latency computer networks will make a wide range of previously unpractical problems feasible. This will inevitably result in the need to develop parallel software whose complexity far exceeds that of parallel programs being developed today. These programs will combine task and data parallelism within a single application. In this workshop, the authors will discuss multi-paradigm parallel programs and programming languages to support their development. They will introduce the parallel programming languages Fortran M and Compositional C++. Fortran M is a small set of extensions to Fortran 77; Compositionalmore » C++ is a small set of extensions to C++. They will demonstrate how these languages can be used to develop parallel programs that contain both task and data parallelism and how these languages are well suited to writing reusable parallel program libraries.« less
  • The Image Content Engine (ICE) is a framework of software and underlying mathematical and physical models that enable scientists and analysts to extract features from Terabytes of imagery and search the extracted features for content relevant to their problem domain. The ICE team has developed a set of tools for feature extraction and analysis of image data, primarily based on the image content. The scale and volume of imagery that must be searched presents a formidable computation and data bandwidth challenge, and a search of moderate to large scale imagery quickly becomes intractable without exploiting high degrees of data parallelismmore » in the feature extraction engine. In this paper we describe the software and hardware architecture developed to build a data parallel processing engine for ICE. We discuss our highly tunable parallel process and job scheduling subsystem, remote procedure invocation, parallel I/O strategy, and our experience in running ICE on a 16 node, 32 processing element (CPU) Linux Cluster. We present performance and benchmark results, and describe how we obtain excellent speedup for the imagery searches in our test-bed prototype.« less
  • R is a domain specific language widely used for data analysis by the statistics community as well as by researchers in finance, biology, social sciences, and many other disciplines. As R programs are linked to input data, the exponential growth of available data makes high-performance computing with R imperative. To ease the process of writing parallel programs in R, code transformation from a sequential program to a parallel version would bring much convenience to R users. In this paper, we present our work in semiautomatic parallelization of R codes with user-added OpenMPstyle pragmas. While such pragmas are used at themore » frontend, we take advantage of multiple parallel backends with different R packages. We provide flexibility for importing parallelism with plug-in components, impose built-in MapReduce for data processing, and also maintain code reusability. We illustrate the advantage of the on-the-fly mechanisms which can lead to significant applications in data-centered parallel computing.« less