skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable Parallel Utopia


This contribution proposes a 128 bit wide interface structure clocked at approximately 80 MHz that will operate at 10 Gbps as a strawman for a 0C192C Utopia Specification. In addition, the concept of scalable width of data transfers in order to maintain manageably low clock rates is proposed.

Publication Date:
Research Org.:
Sandia National Laboratories, Albuquerque, NM, and Livermore, CA
Sponsoring Org.:
OSTI Identifier:
Report Number(s):
ON: DE00000756
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: The ATM Forum Technical Committee; Gold Coast, Australia; 10/5-9/1998
Country of Publication:
United States
99 MATHEMATICS, COMPUTERS, INFORMATION SCIENCE, MANAGEMENT, LAW, MISCELLANEOUS; Sandia National Laboratories; Computers; Equipment Interfaces

Citation Formats

King, D., and Pierson, L.. Scalable Parallel Utopia. United States: N. p., 1998. Web.
King, D., & Pierson, L.. Scalable Parallel Utopia. United States.
King, D., and Pierson, L.. 1998. "Scalable Parallel Utopia". United States. doi:.
title = {Scalable Parallel Utopia},
author = {King, D. and Pierson, L.},
abstractNote = {This contribution proposes a 128 bit wide interface structure clocked at approximately 80 MHz that will operate at 10 Gbps as a strawman for a 0C192C Utopia Specification. In addition, the concept of scalable width of data transfers in order to maintain manageably low clock rates is proposed.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = 1998,
month =

Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Scalable parallel processing is a driving force in high performance computing research. The notion of scalable parallel algorithms continues to elude computational science as a whole. Some problems are trivially parallelized while others have hidden parallelism and pipelining. Von Neumann languages, such as FORTRAN and C, are a limiting factor influencing the performance of parallel software. This paper proposes a software architecture to support the notion of scalable parallel software architecture.
  • pC++ is a language extension to C++ designed to allow programmers to compose ``concurrent aggregate`` collection classes which can be aligned and distributed over the memory hierarchy of a parallel machine in a manner modeled on the High Performance Fortran Forum (HPFF) directives for Fortran 90. pC++ allows the user to write portable and efficient code which will run on a wide range of scalable parallel computer systems. The first version of the compiler is a preprocessor which generates Single Program Multiple Data (SPMD) C++ code. Currently, it runs on the Thinking Machines CM-5, the Intel Paragon, the BBN TC2000,more » the Kendall Square Research KSR-1, and the Sequent Symmetry. In this paper the authors describe the implementation of the runtime system, which provides the concurrency and communication primitives between objects in a distributed collection. To illustrate the behavior of the runtime system they include a description and performance results on four benchmark programs.« less
  • A new era of high-energy physics research is beginning requiring accelerators with much higher luminosities and interaction rates in order to discover new elementary particles. As a consequence, both orders of magnitude higher data rates from the detector and online processing power, well beyond the capabilities of current high energy physics data acquisition systems, are required. This paper describes a proposed new data acquisition system architecture which draws heavily from the communications industry, is totally parallel (i.e., without any bottlenecks), is capable of data rates of hundreds of Gigabytes per second from the detector and into an array of onlinemore » processors (i.e., processor farm), and uses an open systems architecture to guarantee compatibility with future commercially available online processor farms. The main features of the proposed Scalable Parallel Open Architecture data acquisition system are standard interface ICs to detector subsystems wherever possible, fiber optic digital data transmission from the near-detector electronics, a self-routing parallel event builder, and the use of industry-supported and high-level language programmable processors in the proposed BCD system for {und both} triggers and online filters. A brief status report of an ongoing project at Fermilab to build a prototype of the proposed data acquisition system architecture is given in the paper. The major component of the system, a self-routing parallel event builder, is described in detail.« less
  • We are pleased to submit our efforts in parallelizing the PRONTO application suite for con- sideration in the SuParCup 99 competition. PRONTO is a finite element transient dynamics simulator which includes a smoothed particle hydrodynamics (SPH) capability; it is similar in scope to the well-known DYNA, PamCrash, and ABAQUS codes. Our efforts over the last few years have produced a fully parallel version of the entire PRONTO code which (1) runs fast and scalably on thousands of processors, (2) has performed the largest finite-element transient dynamics simulations we are aware of, and (3) includes several new parallel algorithmic ideas thatmore » have solved some difficult problems associated with contact detection and SPH scalability. We motivate this work, describe the novel algorithmic advances, give performance numbers for PRONTO running on Sandia's Intel Teraflop machine, and highlight two prototypical large-scale computations we have performed with the parallel code. We have successfully parallelized a large-scale production transient dynamics code with a novel algorithmic approach that utilizes multiple decompositions for different key segments of the computations. To be able to simulate a more than ten million element model in a few tenths of second per timestep is unprecedented for solid dynamics simulations, especially when full global contact searches are required. The key reason is our new algorithmic ideas for efficiently parallelizing the contact detection stage. To our knowledge scalability of this computation had never before been demonstrated on more than 64 processors. This has enabled parallel PRONTO to become the only solid dynamics code we are aware of that can run effectively on 1000s of processors. More importantly, our parallel performance compares very favorably to the original serial PRONTO code which is optimized for vector supercomputers. On the container crush problem, a Teraflop node is as fast as a single processor of the Cray Jedi. This means that on the Teraflop machine we can now run simulations with tens of millions of elements thousands of times faster than we could on the Jedi! This is enabling transient dynamics simulations of unprecedented scale and fidelity. Not only can previous applications be run with vastly improved resolution and speed, but qualitatively new and different analyses have been made possible.« less
  • Scalable local-memory computer systems, such as IBM`s SP1, have particular implications for interior point and simplex algorithms for general large sparse linear programs. This paper describes the current status of Parallel Linear and Mixed Integer programming in the Optimization Subroutine Library and gives computational results.