skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Testbed of Parallel Kernels for Computer Science Research

Technical Report ·
DOI:https://doi.org/10.2172/983273· OSTI ID:983273

For several decades, computer scientists have sought guidance on how to evolve architectures, languages, and programming models for optimal performance, efficiency, and productivity. Unfortunately, this guidance is most often taken from the existing software/hardware ecosystem. Architects attempt to provide micro-architectural solutions to improve performance on fixed binaries. Researchers tweak compilers to improve code generation for existing architectures and implementations, and they may invent new programming models for fixed processor and memory architectures and computational algorithms. In today's rapidly evolving world of on-chip parallelism, these isolated and iterative improvements to performance may miss superior solutions in the same way gradient descent optimization techniques may get stuck in local minima. In an initial study, we have developed an alternate approach that, rather than starting with an existing hardware/software solution laced with hidden assumptions, defines the computational problems of interest and invites architects, researchers and programmers to implement novel hardware/ software co-designed solutions. Our work builds on the previous ideas of computational dwarfs, motifs, and parallel patterns by selecting a representative set of essential problems for which we provide: An algorithmic description; scalable problem definition; illustrative reference implementations; verification schemes. For simplicity, we focus initially on the computational problems of interest to the scientific computing community but proclaim the methodology (and perhaps a subset of the problems) as applicable to other communities. We intend to broaden the coverage of this problem space through stronger community involvement. Previous work has established a broad categorization of numerical methods of interest to the scientific computing, in the spirit of the NAS Benchmarks, which pioneered the basic idea of a 'pencil and paper benchmark' in the 1990s. The initial result of the more modern study was the seven dwarfs, which was subsequently extended to 13 motifs. These motifs have already been useful in defining classes of applications for architecture-software studies. However, these broad-brush problem statements often miss the nuance seen in individual kernels. For example, the computational requirements of particle methods vary greatly between the naive (but more accurate) direct calculations and the particle-mesh and particle-tree codes. Thus we commenced our study with an enumeration of problems, but then proceeded by providing not only reference implementations for each problem, but more importantly a mathematical definition that allows one to escape iterative approaches to software/hardware optimization. To ensure long term value, we have augmented each of our reference implementations with both a scalable problem generator and a verification scheme. In a paper we have prepared that documents our efforts, we describe in detail this process of problem definition, scalable input creation, verification, and implementation of reference codes for the scientific computing domain. Table 1 enumerates and describes the level of support we've developed for each kernel. We group these important kernels using the Berkeley dwarfs/motifs taxonomy using a red box in the appropriate column. As kernels become progressively complex, they build upon other, simpler computational methods. We note this dependency via orange boxes. After enumeration of the important numerical problems, we created a domain-appropriate high-level definition of each problem. To ensure future endeavors are not tainted by existing implementations, we specified the problem definition to be independent of both computer architecture and existing programming languages, models, and data types. Then, to provide context as to how such kernels productively map to existing architectures, languages and programming models, we produced reference implementations for most of the kernel (see table). These sample codes should be viewed as 'hints,' designed to show how other designers have mapped a problem's operands and operators to existing hardware and software. Since we wanted such implementations to be illustrative, we tried to ensure they were the most straightforward implementation in the easiest to understand languages using familiar architectures. To that end, most of the linear algebra-oriented computations are written in MATLAB using array indexing to process matrices, rather than one-line library calls to compute the same kernel. This ensures that the kernel's computation is explicit and readable in the implementation and not hidden behind a library. For other problems, such as the Barnes-Hut n-Body solver, the implementations were written in pure C, without any supporting library computations. Along this line, we have also created a scalable problem generator to accompany each computation.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
Computational Research Division
DOE Contract Number:
DE-AC02-05CH11231
OSTI ID:
983273
Report Number(s):
LBNL-3401E; TRN: US201014%%601
Country of Publication:
United States
Language:
English