National Library of Energy BETA

Sample records for massively parallel microcell-based

  1. Discontinuous Methods for Accurate, Massively Parallel Quantum...

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Investigator for Discontinuous Methods for Accurate, Massively Parallel Quantum Molecular Dynamics. Discontinuous Methods for Accurate, Massively Parallel Quantum...

  2. Impact analysis on a massively parallel computer

    SciTech Connect (OSTI)

    Zacharia, T.; Aramayo, G.A.

    1994-06-01

    Advanced mathematical techniques and computer simulation play a major role in evaluating and enhancing the design of beverage cans, industrial, and transportation containers for improved performance. Numerical models are used to evaluate the impact requirements of containers used by the Department of Energy (DOE) for transporting radioactive materials. Many of these models are highly compute-intensive. An analysis may require several hours of computational time on current supercomputers despite the simplicity of the models being studied. As computer simulations and materials databases grow in complexity, massively parallel computers have become important tools. Massively parallel computational research at the Oak Ridge National Laboratory (ORNL) and its application to the impact analysis of shipping containers is briefly described in this paper.

  3. Template based parallel checkpointing in a massively parallel computer system

    DOE Patents [OSTI]

    Archer, Charles Jens; Inglett, Todd Alan

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  4. A Massively Parallel Solver for the Mechanical Harmonic Analysis...

    Office of Scientific and Technical Information (OSTI)

    Details In-Document Search Title: A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities ACE3P is a 3D massively parallel simulation suite that...

  5. MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

    SciTech Connect (OSTI)

    Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

    2013-05-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.

  6. Massively Parallel LES of Azimuthal Thermo-Acoustic Instabilities...

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Massively Parallel LES of Azimuthal Thermo-Acoustic Instabilities in Annular Gas Turbines Authors: Wolf, P., Staffelbach, G., Roux, A., Gicquel, L., Poinsot, T., Moureau, V. ...

  7. Massively Parallel Models of the Human Circulatory System (Conference...

    Office of Scientific and Technical Information (OSTI)

    Massively Parallel Models of the Human Circulatory System Citation Details In-Document ... Sponsoring Org: USDOE Country of Publication: United States Language: English Subject: 59 ...

  8. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and...

    Office of Scientific and Technical Information (OSTI)

    Technical Report: PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes Citation Details In-Document Search...

  9. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and...

    Office of Scientific and Technical Information (OSTI)

    PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes Lichtner, Peter OFM Research; Karra, Satish Los...

  10. Massively parallel mesh generation for physics codes

    SciTech Connect (OSTI)

    Hardin, D.D.

    1996-06-01

    Massively parallel processors (MPPs) will soon enable realistic 3-D physical modeling of complex objects and systems. Work is planned or presently underway to port many of LLNL`s physical modeling codes to MPPs. LLNL`s DSI3D electromagnetics code already can solve 40+ million zone problems on the 256 processor Meiko. However, the author lacks the software necessary to generate and manipulate the large meshes needed to model many complicated 3-D geometries. State-of-the-art commercial mesh generators run on workstations and have a practical limit of several hundred thousand elements. In the foreseeable future MPPs will solve problems with a billion mesh elements. The objective of the Parallel Mesh Generation (PMESH) Project is to develop a unique mesh generation system that can construct large 3-D meshes (up to a billion elements) on MPPs. Such a capability will remove a critical roadblock to unleashing the power of MPPs for physical analysis and will put LLNL at the forefront of mesh generation technology. PMESH will ``front-end`` a variety of LLNL 3-D physics codes, including those in the areas of electromagnetics, structural mechanics, thermal analysis, and hydrodynamics. The DSI3D and DYNA3D codes are already running on MPPs. The primary goal of the PMESH project is to provide the robust generation of large meshes for complicated 3-D geometries through the appropriate distribution of the generation task between the user`s workstation and the MPP. Secondary goals are to support the unique features of LLNL physics codes (e.g., unusual elements) and to minimize the user effort required to generate different meshes for the same geometry. PMESH`s capabilities are essential because mesh generation is presently a major limiting factor in simulating larger and more complex 3-D geometries. PMESH will significantly enhance LLNL`s capabilities in physical simulation by advancing the state-of-the-art in large mesh generation by 2 to 3 orders of magnitude.

  11. Efficient parallel global garbage collection on massively parallel computers

    SciTech Connect (OSTI)

    Kamada, Tomio; Matsuoka, Satoshi; Yonezawa, Akinori

    1994-12-31

    On distributed-memory high-performance MPPs where processors are interconnected by an asynchronous network, efficient Garbage Collection (GC) becomes difficult due to inter-node references and references within pending, unprocessed messages. The parallel global GC algorithm (1) takes advantage of reference locality, (2) efficiently traverses references over nodes, (3) admits minimum pause time of ongoing computations, and (4) has been shown to scale up to 1024 node MPPs. The algorithm employs a global weight counting scheme to substantially reduce message traffic. The two methods for confirming the arrival of pending messages are used: one counts numbers of messages and the other uses network `bulldozing.` Performance evaluation in actual implementations on a multicomputer with 32-1024 nodes, Fujitsu AP1000, reveals various favorable properties of the algorithm.

  12. Massively Parallel LES of Azimuthal Thermo-Acoustic Instabilities in

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Annular Gas Turbines | Argonne Leadership Computing Facility Massively Parallel LES of Azimuthal Thermo-Acoustic Instabilities in Annular Gas Turbines Authors: Wolf, P., Staffelbach, G., Roux, A., Gicquel, L., Poinsot, T., Moureau, V. Increasingly stringent regulations and the need to tackle rising fuel prices have placed great emphasis on the design of aeronautical gas turbines, which are unfortunately more and more prone to combustion instabilities. In the particular field of annular

  13. BlueGene/L Applications: Parallelism on a Massive Scale (Journal...

    Office of Scientific and Technical Information (OSTI)

    BlueGeneL Applications: Parallelism on a Massive Scale Citation Details In-Document Search Title: BlueGeneL Applications: Parallelism on a Massive Scale You are accessing a ...

  14. Requirements for supercomputing in energy research: The transition to massively parallel computing

    SciTech Connect (OSTI)

    Not Available

    1993-02-01

    This report discusses: The emergence of a practical path to TeraFlop computing and beyond; requirements of energy research programs at DOE; implementation: supercomputer production computing environment on massively parallel computers; and implementation: user transition to massively parallel computing.

  15. Routing performance analysis and optimization within a massively parallel computer

    DOE Patents [OSTI]

    Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen

    2013-04-16

    An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.

  16. A massively parallel fractional step solver for incompressible flows

    SciTech Connect (OSTI)

    Houzeaux, G. Vazquez, M. Aubry, R. Cela, J.M.

    2009-09-20

    This paper presents a parallel implementation of fractional solvers for the incompressible Navier-Stokes equations using an algebraic approach. Under this framework, predictor-corrector and incremental projection schemes are seen as sub-classes of the same class, making apparent its differences and similarities. An additional advantage of this approach is to set a common basis for a parallelization strategy, which can be extended to other split techniques or to compressible flows. The predictor-corrector scheme consists in solving the momentum equation and a modified 'continuity' equation (namely a simple iteration for the pressure Schur complement) consecutively in order to converge to the monolithic solution, thus avoiding fractional errors. On the other hand, the incremental projection scheme solves only one iteration of the predictor-corrector per time step and adds a correction equation to fulfill the mass conservation. As shown in the paper, these two schemes are very well suited for massively parallel implementation. In fact, when compared with monolithic schemes, simpler solvers and preconditioners can be used to solve the non-symmetric momentum equations (GMRES, Bi-CGSTAB) and to solve the symmetric continuity equation (CG, Deflated CG). This gives good speedup properties of the algorithm. The implementation of the mesh partitioning technique is presented, as well as the parallel performances and speedups for thousands of processors.

  17. Massively parallel processor networks with optical express channels

    DOE Patents [OSTI]

    Deri, R.J.; Brooks, E.D. III; Haigh, R.E.; DeGroot, A.J.

    1999-08-24

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination. 3 figs.

  18. Massively parallel processor networks with optical express channels

    DOE Patents [OSTI]

    Deri, Robert J.; Brooks, III, Eugene D.; Haigh, Ronald E.; DeGroot, Anthony J.

    1999-01-01

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination.

  19. Comparing current cluster, massively parallel, and accelerated systems

    SciTech Connect (OSTI)

    Barker, Kevin J; Davis, Kei; Hoisie, Adolfy; Kerbyson, Darren J; Pakin, Scott; Lang, Mike; Sancho Pitarch, Jose C

    2010-01-01

    Currently there is large architectural diversity in high perfonnance computing systems. They include 'commodity' cluster systems that optimize per-node performance for small jobs, massively parallel processors (MPPs) that optimize aggregate perfonnance for large jobs, and accelerated systems that optimize both per-node and aggregate performance but only for applications custom-designed to take advantage of such systems. Because of these dissimilarities, meaningful comparisons of achievable performance are not straightforward. In this work we utilize a methodology that combines both empirical analysis and performance modeling to compare clusters (represented by a 4,352-core IB cluster), MPPs (represented by a 147,456-core BG/P), and accelerated systems (represented by the 129,600-core Roadrunner) across a workload of four applications. Strengths of our approach include the ability to compare architectures - as opposed to specific implementations of an architecture - attribute each application's performance bottlenecks to characteristics unique to each system, and to explore performance scenarios in advance of their availability for measurement. Our analysis illustrates that application performance is essentially unrelated to relative peak performance but that application performance can be both predicted and explained using modeling.

  20. SWAMP+: multiple subsequence alignment using associative massive parallelism

    SciTech Connect (OSTI)

    Steinfadt, Shannon Irene [Los Alamos National Laboratory; Baker, Johnnie W [KENT STATE UNIV.

    2010-10-18

    A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.

  1. Factorization of large integers on a massively parallel computer

    SciTech Connect (OSTI)

    Davis, J.A.; Holdridge, D.B.

    1988-01-01

    Our interest in integer factorization at Sandia National Laboratories is motivated by cryptographic applications and in particular the security of the RSA encryption-decryption algorithm. We have implemented our version of the quadratic sieve procedure on the NCUBE computer with 1024 processors (nodes). The new code is significantly different in all important aspects from the program used to factor number of order 10/sup 70/ on a single processor CRAY computer. Capabilities of parallel processing and limitation of small local memory necessitated this entirely new implementation. This effort involved several restarts as realizations of program structures that seemed appealing bogged down due to inter-processor communications. We are presently working with integers of magnitude about 10/sup 70/ in tuning this code to the novel hardware. 6 refs., 3 figs.

  2. High performance computing in chemistry and massively parallel computers: A simple transition?

    SciTech Connect (OSTI)

    Kendall, R.A.

    1993-03-01

    A review of the various problems facing any software developer targeting massively parallel processing (MPP) systems is presented. Issues specific to computational chemistry application software will be also outlined. Computational chemistry software ported to and designed for the Intel Touchstone Delta Supercomputer will be discussed. Recommendations for future directions will also be made.

  3. Large-eddy simulation of the Rayleigh-Taylor instability on a massively parallel computer

    SciTech Connect (OSTI)

    Amala, P.A.K.

    1995-03-01

    A computational model for the solution of the three-dimensional Navier-Stokes equations is developed. This model includes a turbulence model: a modified Smagorinsky eddy-viscosity with a stochastic backscatter extension. The resultant equations are solved using finite difference techniques: the second-order explicit Lax-Wendroff schemes. This computational model is implemented on a massively parallel computer. Programming models on massively parallel computers are next studied. It is desired to determine the best programming model for the developed computational model. To this end, three different codes are tested on a current massively parallel computer: the CM-5 at Los Alamos. Each code uses a different programming model: one is a data parallel code; the other two are message passing codes. Timing studies are done to determine which method is the fastest. The data parallel approach turns out to be the fastest method on the CM-5 by at least an order of magnitude. The resultant code is then used to study a current problem of interest to the computational fluid dynamics community. This is the Rayleigh-Taylor instability. The Lax-Wendroff methods handle shocks and sharp interfaces poorly. To this end, the Rayleigh-Taylor linear analysis is modified to include a smoothed interface. The linear growth rate problem is then investigated. Finally, the problem of the randomly perturbed interface is examined. Stochastic backscatter breaks the symmetry of the stationary unstable interface and generates a mixing layer growing at the experimentally observed rate. 115 refs., 51 figs., 19 tabs.

  4. Analysis of gallium arsenide deposition in a horizontal chemical vapor deposition reactor using massively parallel computations

    SciTech Connect (OSTI)

    Salinger, A.G.; Shadid, J.N.; Hutchinson, S.A.

    1998-01-01

    A numerical analysis of the deposition of gallium from trimethylgallium (TMG) and arsine in a horizontal CVD reactor with tilted susceptor and a three inch diameter rotating substrate is performed. The three-dimensional model includes complete coupling between fluid mechanics, heat transfer, and species transport, and is solved using an unstructured finite element discretization on a massively parallel computer. The effects of three operating parameters (the disk rotation rate, inlet TMG fraction, and inlet velocity) and two design parameters (the tilt angle of the reactor base and the reactor width) on the growth rate and uniformity are presented. The nonlinear dependence of the growth rate uniformity on the key operating parameters is discussed in detail. Efficient and robust algorithms for massively parallel reacting flow simulations, as incorporated into our analysis code MPSalsa, make detailed analysis of this complicated system feasible.

  5. Massively parallel Monte Carlo for many-particle simulations on GPUs

    SciTech Connect (OSTI)

    Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.

    2013-12-01

    Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.

  6. A massively parallel adaptive finite element method with dynamic load balancing

    SciTech Connect (OSTI)

    Devine, K.D.; Flaherty, J.E.; Wheat, S.R.; Maccabe, A.B.

    1993-05-01

    We construct massively parallel, adaptive finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We demonstrate parallel efficiency through computations on a 1024-processor nCUBE/2 hypercube. We also present results using adaptive p-refinement to reduce the computational cost of the method. We describe tiling, a dynamic, element-based data migration system. Tiling dynamically maintains global load balance in the adaptive method by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. We demonstrate the effectiveness of the dynamic load balancing with adaptive p-refinement examples.

  7. A massively parallel adaptive finite element method with dynamic load balancing

    SciTech Connect (OSTI)

    Devine, K.D.; Flaherty, J.E.; Wheat, S.R.; Maccabe, A.B.

    1993-12-31

    The authors construct massively parallel adaptive finite element methods for the solution of hyperbolic conservation laws. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. The resulting method is of high order and may be parallelized efficiently on MIMD computers. They demonstrate parallel efficiency through computations on a 1024-processor nCUBE/2 hypercube. They present results using adaptive p-refinement to reduce the computational cost of the method, and tiling, a dynamic, element-based data migration system that maintains global load balance of the adaptive method by overlapping neighborhoods of processors that each perform local balancing.

  8. Molecular Dynamics Simulations from SNL's Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)

    DOE Data Explorer [Office of Scientific and Technical Information (OSTI)]

    Plimpton, Steve; Thompson, Aidan; Crozier, Paul

    LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.

  9. Capabilities, Implementation, and Benchmarking of Shift, a Massively Parallel Monte Carlo Radiation Transport Code

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Pandya, Tara M; Johnson, Seth R; Evans, Thomas M; Davidson, Gregory G; Hamilton, Steven P; Godfrey, Andrew T

    2016-01-01

    This work discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore »specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 R problems. These benchmark and scaling studies show promising results.« less

  10. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.« less

  11. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    SciTech Connect (OSTI)

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Some specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.

  12. A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities

    SciTech Connect (OSTI)

    O. Kononenko

    2015-02-17

    ACE3P is a 3D massively parallel simulation suite that developed at SLAC National Accelerator Laboratory that can perform coupled electromagnetic, thermal and mechanical study. Effectively utilizing supercomputer resources, ACE3P has become a key simulation tool for particle accelerator R and D. A new frequency domain solver to perform mechanical harmonic response analysis of accelerator components is developed within the existing parallel framework. This solver is designed to determine the frequency response of the mechanical system to external harmonic excitations for time-efficient accurate analysis of the large-scale problems. Coupled with the ACE3P electromagnetic modules, this capability complements a set of multi-physics tools for a comprehensive study of microphonics in superconducting accelerating cavities in order to understand the RF response and feedback requirements for the operational reliability of a particle accelerator. (auth)

  13. DGDFT: A massively parallel method for large scale density functional theory calculations

    SciTech Connect (OSTI)

    Hu, Wei Yang, Chao; Lin, Lin

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  14. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

    SciTech Connect (OSTI)

    Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel

    2009-02-15

    We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.

  15. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; Anderson, Steve; Woodward, Paul; Dietz, Hank

    1995-01-01

    Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less

  16. User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

    SciTech Connect (OSTI)

    Earth Sciences Division; Zhang, Keni; Zhang, Keni; Wu, Yu-Shu; Pruess, Karsten

    2008-05-27

    TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator is to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used

  17. Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis

    SciTech Connect (OSTI)

    Madduri, Kamesh; Bader, David A.

    2009-02-15

    Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socioeconomic interactions, social networking web sites, communication traffic, and scientific computing can be intuitively modeled as graphs. We present the first study of novel high-performance combinatorial techniques for analyzing large-scale information networks, encapsulating dynamic interaction data in the order of billions of entities. We present new data structures to represent dynamic interaction networks, and discuss algorithms for processing parallel insertions and deletions of edges in small-world networks. With these new approaches, we achieve an average performance rate of 25 million structural updates per second and a parallel speedup of nearly28 on a 64-way Sun UltraSPARC T2 multicore processor, for insertions and deletions to a small-world network of 33.5 million vertices and 268 million edges. We also design parallel implementations of fundamental dynamic graph kernels related to connectivity and centrality queries. Our implementations are freely distributed as part of the open-source SNAP (Small-world Network Analysis and Partitioning) complex network analysis framework.

  18. 3-D readout-electronics packaging for high-bandwidth massively paralleled imager

    DOE Patents [OSTI]

    Kwiatkowski, Kris; Lyke, James

    2007-12-18

    Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.

  19. A Massively Parallel Sparse Eigensolver for Structural Dynamics Finite Element Analysis

    SciTech Connect (OSTI)

    Day, David M.; Reese, G.M.

    1999-05-01

    Eigenanalysis is a critical component of structural dynamics which is essential for determinating the vibrational response of systems. This effort addresses the development of numerical algorithms associated with scalable eigensolver techniques suitable for use on massively parallel, distributed memory computers that are capable of solving large scale structural dynamics problems. An iterative Lanczos method was determined to be the best choice for the application. Scalability of the eigenproblem depends on scalability of the underlying linear solver. A multi-level solver (FETI) was selected as most promising for this component. Issues relating to heterogeneous materials, mechanisms and multipoint constraints have been examined, and the linear solver algorithm has been developed to incorporate features that result in a scalable, robust algorithm for practical structural dynamics applications. The resulting tools have been demonstrated on large problems representative of a weapon's system.

  20. GPAW - massively parallel electronic structure calculations with Python-based software.

    SciTech Connect (OSTI)

    Enkovaara, J.; Romero, N.; Shende, S.; Mortensen, J.

    2011-01-01

    Electronic structure calculations are a widely used tool in materials science and large consumer of supercomputing resources. Traditionally, the software packages for these kind of simulations have been implemented in compiled languages, where Fortran in its different versions has been the most popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most of the productivity enhancing features together with a good numerical performance. We have used this approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges it is possible to obtain good numerical performance and good parallel scalability with Python based software.

  1. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

    2015-04-08

    The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less

  2. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

    SciTech Connect (OSTI)

    Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

    2015-04-08

    The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on the performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.

  3. Tracking the roots of cellulase hyperproduction by the fungus Trichoderma reesei using massively parallel DNA sequencing

    SciTech Connect (OSTI)

    Le Crom, Stphane; Schackwitz, Wendy; Pennacchiod, Len; Magnuson, Jon K.; Culley, David E.; Collett, James R.; Martin, Joel X.; Druzhinina, Irina S.; Mathis, Hugues; Monot, Frdric; Seiboth, Bernhard; Cherry, Barbara; Rey, Michael; Berka, Randy; Kubicek, Christian P.; Baker, Scott E.; Margeot, Antoine

    2009-09-22

    Trichoderma reesei (teleomorph Hypocrea jecorina) is the main industrial source of cellulases and hemicellulases harnessed for the hydrolysis of biomass to simple sugars, which can then be converted to biofuels, such as ethanol, and other chemicals. The highly productive strains in use today were generated by classical mutagenesis. To learn how cellulase production was improved by these techniques, we performed massively parallel sequencing to identify mutations in the genomes of two hyperproducing strains (NG14, and its direct improved descendant, RUT C30). We detected a surprisingly high number of mutagenic events: 223 single nucleotides variants, 15 small deletions or insertions and 18 larger deletions leading to the loss of more than 100 kb of genomic DNA. From these events we report previously undocumented non-synonymous mutations in 43 genes that are mainly involved in nuclear transport, mRNA stability, transcription, secretion/vacuolar targeting, and metabolism. This homogeneity of functional categories suggests that multiple changes are necessary to improve cellulase production and not simply a few clear-cut mutagenic events. Phenotype microarrays show that some of these mutations result in strong changes in the carbon assimilation pattern of the two mutants with respect to the wild type strain QM6a. Our analysis provides the first genome-wide insights into the changes induced by classical mutagenesis in a filamentous fungus, and suggests new areas for the generation of enhanced T. reesei strains for industrial applications such as biofuel production.

  4. Massively parallel computation of 3D flow and reactions in chemical vapor deposition reactors

    SciTech Connect (OSTI)

    Salinger, A.G.; Shadid, J.N.; Hutchinson, S.A.; Hennigan, G.L.; Devine, K.D.; Moffat, H.K.

    1997-12-01

    Computer modeling of Chemical Vapor Deposition (CVD) reactors can greatly aid in the understanding, design, and optimization of these complex systems. Modeling is particularly attractive in these systems since the costs of experimentally evaluating many design alternatives can be prohibitively expensive, time consuming, and even dangerous, when working with toxic chemicals like Arsine (AsH{sub 3}): until now, predictive modeling has not been possible for most systems since the behavior is three-dimensional and governed by complex reaction mechanisms. In addition, CVD reactors often exhibit large thermal gradients, large changes in physical properties over regions of the domain, and significant thermal diffusion for gas mixtures with widely varying molecular weights. As a result, significant simplifications in the models have been made which erode the accuracy of the models` predictions. In this paper, the authors will demonstrate how the vast computational resources of massively parallel computers can be exploited to make possible the analysis of models that include coupled fluid flow and detailed chemistry in three-dimensional domains. For the most part, models have either simplified the reaction mechanisms and concentrated on the fluid flow, or have simplified the fluid flow and concentrated on rigorous reactions. An important CVD research thrust has been in detailed modeling of fluid flow and heat transfer in the reactor vessel, treating transport and reaction of chemical species either very simply or as a totally decoupled problem. Using the analogy between heat transfer and mass transfer, and the fact that deposition is often diffusion limited, much can be learned from these calculations; however, the effects of thermal diffusion, the change in physical properties with composition, and the incorporation of surface reaction mechanisms are not included in this model, nor can transitions to three-dimensional flows be detected.

  5. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOE Patents [OSTI]

    Karasick, M.S.; Strip, D.R.

    1996-01-30

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.

  6. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOE Patents [OSTI]

    Karasick, Michael S.; Strip, David R.

    1996-01-01

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.

  7. Method and apparatus for obtaining stack traceback data for multiple computing nodes of a massively parallel computer system

    DOE Patents [OSTI]

    Gooding, Thomas Michael; McCarthy, Patrick Joseph

    2010-03-02

    A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.

  8. Massively parallel processing on the Intel Paragon system: One tool in achieving the goals of the Human Genome Project

    SciTech Connect (OSTI)

    Ecklund, D.J.

    1993-12-31

    A massively parallel computing system is one tool that has been adopted by researchers in the Human Genome Project. This tool is one of many in a toolbox of theories, algorithms, and systems that are used to attack the many questions posed by the project. A good tool functions well when applied alone to the problem for which it was devised. A superior tool achieves its solitary goal, and supports and interacts with other tools to achieve goals beyond the scope of any individual tool. The author believes that Intel`s massively parallel Paragon{trademark} XP/S system is a superior tool. This paper presents specific requirements for a superior computing tool for the Human Genome Project (HGP) and shows how the Paragon system addresses these requirements. Computing requirements for HGP are based on three factors: (1) computing requirements of algorithms currently used in sequence homology, protein folding, and database insertion/retrieval; (2) estimates of the computing requirements of new applications arising from evolving biological theories; and (3) the requirements for facilities that support collaboration among scientists in a project of this magnitude. The Paragon system provides many hardware and software features that effectively address these requirements.

  9. A Novel Algorithm for Solving the Multidimensional Neutron Transport Equation on Massively Parallel Architectures

    SciTech Connect (OSTI)

    Azmy, Yousry

    2014-06-10

    We employ the Integral Transport Matrix Method (ITMM) as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells' fluxes and between the cells' and boundary surfaces' fluxes. The main goals of this work are to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and parallel performance of the developed methods with increasing number of processes, P. The fastest observed parallel solution method, Parallel Gauss-Seidel (PGS), was used in a weak scaling comparison with the PARTISN transport code, which uses the source iteration (SI) scheme parallelized with the Koch-baker-Alcouffe (KBA) method. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method- even without acceleration/preconditioning-is completitive for optically thick problems as P is increased to the tens of thousands range. For the most optically thick cells tested, PGS reduced execution time by an approximate factor of three for problems with more than 130 million computational cells on P = 32,768. Moreover, the SI-DSA execution times's trend rises generally more steeply with increasing P than the PGS trend. Furthermore, the PGS method outperforms SI for the periodic heterogeneous layers (PHL) configuration problems. The PGS method outperforms SI and SI-DSA on as few as P = 16 for PHL problems and reduces execution time by a factor of ten or more for all problems considered with more than 2 million computational cells on P = 4.096.

  10. Analysis and selection of optimal function implementations in massively parallel computer

    DOE Patents [OSTI]

    Archer, Charles Jens; Peters, Amanda; Ratterman, Joseph D.

    2011-05-31

    An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.

  11. LiNbO{sub 3}: A photovoltaic substrate for massive parallel manipulation and patterning of nano-objects

    SciTech Connect (OSTI)

    Carrascosa, M.; García-Cabañes, A.; Jubera, M.; Ramiro, J. B.; Agulló-López, F.

    2015-12-15

    The application of evanescent photovoltaic (PV) fields, generated by visible illumination of Fe:LiNbO{sub 3} substrates, for parallel massive trapping and manipulation of micro- and nano-objects is critically reviewed. The technique has been often referred to as photovoltaic or photorefractive tweezers. The main advantage of the new method is that the involved electrophoretic and/or dielectrophoretic forces do not require any electrodes and large scale manipulation of nano-objects can be easily achieved using the patterning capabilities of light. The paper describes the experimental techniques for particle trapping and the main reported experimental results obtained with a variety of micro- and nano-particles (dielectric and conductive) and different illumination configurations (single beam, holographic geometry, and spatial light modulator projection). The report also pays attention to the physical basis of the method, namely, the coupling of the evanescent photorefractive fields to the dielectric response of the nano-particles. The role of a number of physical parameters such as the contrast and spatial periodicities of the illumination pattern or the particle deposition method is discussed. Moreover, the main properties of the obtained particle patterns in relation to potential applications are summarized, and first demonstrations reviewed. Finally, the PV method is discussed in comparison to other patterning strategies, such as those based on the pyroelectric response and the electric fields associated to domain poling of ferroelectric materials.

  12. Massively-parallel electron dynamics calculations in real-time and real-space: Toward applications to nanostructures of more than ten-nanometers in size

    SciTech Connect (OSTI)

    Noda, Masashi; Ishimura, Kazuya; Nobusada, Katsuyuki; Yabana, Kazuhiro; Boku, Taisuke

    2014-05-15

    A highly efficient program of massively parallel calculations for electron dynamics has been developed in an effort to apply the method to optical response of nanostructures of more than ten-nanometers in size. The approach is based on time-dependent density functional theory calculations in real-time and real-space. The computational code is implemented by using simple algorithms with a finite-difference method in space derivative and Taylor expansion in time-propagation. Since the computational program is free from the algorithms of eigenvalue problems and fast-Fourier-transformation, which are usually implemented in conventional quantum chemistry or band structure calculations, it is highly suitable for massively parallel calculations. Benchmark calculations using the K computer at RIKEN demonstrate that the parallel efficiency of the program is very high on more than 60 000 CPU cores. The method is applied to optical response of arrays of C{sub 60} orderly nanostructures of more than 10 nm in size. The computed absorption spectrum is in good agreement with the experimental observation.

  13. Method and apparatus for analyzing error conditions in a massively parallel computer system by identifying anomalous nodes within a communicator set

    DOE Patents [OSTI]

    Gooding, Thomas Michael

    2011-04-19

    An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group.

  14. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by dynamically adjusting local routing strategies

    DOE Patents [OSTI]

    Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

    2010-03-16

    A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Each node implements a respective routing strategy for routing data through the network, the routing strategies not necessarily being the same in every node. The routing strategies implemented in the nodes are dynamically adjusted during application execution to shift network workload as required. Preferably, adjustment of routing policies in selective nodes is performed at synchronization points. The network may be dynamically monitored, and routing strategies adjusted according to detected network conditions.

  15. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by employing bandwidth shells at areas of overutilization

    DOE Patents [OSTI]

    Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

    2010-04-27

    A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a final destination. The default routing strategy is altered responsive to detection of overutilization of a particular path of one or more links, and at least some traffic is re-routed by distributing the traffic among multiple paths (which may include the default path). An alternative path may require a greater number of link traversals to reach the destination node.

  16. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by routing through transporter nodes

    DOE Patents [OSTI]

    Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

    2010-11-16

    A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a destination. Some packets are constrained to be routed through respective designated transporter nodes, the automated routing strategy determining a path from a respective source node to a respective transporter node, and from a respective transporter node to a respective destination node. Preferably, the source node chooses a routing policy from among multiple possible choices, and that policy is followed by all intermediate nodes. The use of transporter nodes allows greater flexibility in routing.

  17. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by dynamic global mapping of contended links

    DOE Patents [OSTI]

    Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

    2011-10-04

    A massively parallel nodal computer system periodically collects and broadcasts usage data for an internal communications network. A node sending data over the network makes a global routing determination using the network usage data. Preferably, network usage data comprises an N-bit usage value for each output buffer associated with a network link. An optimum routing is determined by summing the N-bit values associated with each link through which a data packet must pass, and comparing the sums associated with different possible routes.

  18. User's guide of TOUGH2-EGS-MP: A Massively Parallel Simulator with Coupled Geomechanics for Fluid and Heat Flow in Enhanced Geothermal Systems VERSION 1.0

    SciTech Connect (OSTI)

    Xiong, Yi; Fakcharoenphol, Perapon; Wang, Shihao; Winterfeld, Philip H.; Zhang, Keni; Wu, Yu-Shu

    2013-12-01

    TOUGH2-EGS-MP is a parallel numerical simulation program coupling geomechanics with fluid and heat flow in fractured and porous media, and is applicable for simulation of enhanced geothermal systems (EGS). TOUGH2-EGS-MP is based on the TOUGH2-MP code, the massively parallel version of TOUGH2. In TOUGH2-EGS-MP, the fully-coupled flow-geomechanics model is developed from linear elastic theory for thermo-poro-elastic systems and is formulated in terms of mean normal stress as well as pore pressure and temperature. Reservoir rock properties such as porosity and permeability depend on rock deformation, and the relationships between these two, obtained from poro-elasticity theories and empirical correlations, are incorporated into the simulation. This report provides the user with detailed information on the TOUGH2-EGS-MP mathematical model and instructions for using it for Thermal-Hydrological-Mechanical (THM) simulations. The mathematical model includes the fluid and heat flow equations, geomechanical equation, and discretization of those equations. In addition, the parallel aspects of the code, such as domain partitioning and communication between processors, are also included. Although TOUGH2-EGS-MP has the capability for simulating fluid and heat flows coupled with geomechanical effects, it is up to the user to select the specific coupling process, such as THM or only TH, in a simulation. There are several example problems illustrating applications of this program. These example problems are described in detail and their input data are presented. Their results demonstrate that this program can be used for field-scale geothermal reservoir simulation in porous and fractured media with fluid and heat flow coupled with geomechanical effects.

  19. CX-001635: Categorical Exclusion Determination

    Broader source: Energy.gov [DOE]

    Solar American Institute Incubator - Semprius - Massively Parallel Microcell-Based Module ArrayCX(s) Applied: B3.6Date: 04/08/2010Location(s): Durham, North CarolinaOffice(s): Energy Efficiency and Renewable Energy, Golden Field Office

  20. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by semi-randomly varying routing policies for different packets

    DOE Patents [OSTI]

    Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

    2010-11-23

    A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Nodes vary a choice of routing policy for routing data in the network in a semi-random manner, so that similarly situated packets are not always routed along the same path. Semi-random variation of the routing policy tends to avoid certain local hot spots of network activity, which might otherwise arise using more consistent routing determinations. Preferably, the originating node chooses a routing policy for a packet, and all intermediate nodes in the path route the packet according to that policy. Policies may be rotated on a round-robin basis, selected by generating a random number, or otherwise varied.

  1. Parallel computing works

    SciTech Connect (OSTI)

    Not Available

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  2. High-speed massively parallel scanning

    DOE Patents [OSTI]

    Decker, Derek E.

    2010-07-06

    A new technique for recording a series of images of a high-speed event (such as, but not limited to: ballistics, explosives, laser induced changes in materials, etc.) is presented. Such technique(s) makes use of a lenslet array to take image picture elements (pixels) and concentrate light from each pixel into a spot that is much smaller than the pixel. This array of spots illuminates a detector region (e.g., film, as one embodiment) which is scanned transverse to the light, creating tracks of exposed regions. Each track is a time history of the light intensity for a single pixel. By appropriately configuring the array of concentrated spots with respect to the scanning direction of the detection material, different tracks fit between pixels and sufficient lengths are possible which can be of interest in several high-speed imaging applications.

  3. Parallel Dislocation Simulator

    Energy Science and Technology Software Center (OSTI)

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  4. Differences Between Distributed and Parallel Systems

    SciTech Connect (OSTI)

    Brightwell, R.; Maccabe, A.B.; Rissen, R.

    1998-10-01

    Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.

  5. Ultrascalable petaflop parallel supercomputer

    DOE Patents [OSTI]

    Blumrich, Matthias A.; Chen, Dong; Chiu, George; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Hall, Shawn; Haring, Rudolf A.; Heidelberger, Philip; Kopcsay, Gerard V.; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan; Takken, Todd

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  6. TRANSIMS Parallelization

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    TRANSIMS Parallelization This email address is being protected from spambots. You need JavaScript enabled to view it. - TRACC Director This email address is being protected from spambots. You need JavaScript enabled to view it. - Associate Computational Transportation Engineer Background TRANSIMS was originally developed by Los Alamos National Laboratory to run exclusively on a Linux cluster environment. In this initial version, the only parallelized component was the microsimulator. It worked

  7. Massively Parallel Models of the Human Circulatory System (Conference...

    Office of Scientific and Technical Information (OSTI)

    Language: English Subject: 59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE Word Cloud More Like This Full Text preview image File size NAView Full ...

  8. PARALLEL HOP: A SCALABLE HALO FINDER FOR MASSIVE COSMOLOGICAL...

    Office of Scientific and Technical Information (OSTI)

    Center for Astrophysics and Space Sciences, University of California, San Diego, CA 92093 ... Subject: 79 ASTROPHYSICS, COSMOLOGY AND ASTRONOMY; DATA ANALYSIS; DISTRIBUTION; GALAXIES; ...

  9. Parallel Batch Scripts

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel Batch Scripts Parallel Batch Scripts Parallel Environments on Genepool You can run parallel jobs that use MPI or OpenMP on Genepool as long as you make the appropriate changes to your submission script! To investigate the parallel environments that are available on Genepool, you can use Command Description qconf -sp <pename> Show the configuration for the specified parallel environment. qconf -spl Show a list of all currently configured parallel environments. Basic Parallel

  10. Supertwistors and massive particles

    SciTech Connect (OSTI)

    Mezincescu, Luca; Routh, Alasdair J.; Townsend, Paul K.

    2014-07-15

    In the (super)twistor formulation of massless (super)particle mechanics, the mass-shell constraint is replaced by a “spin-shell” constraint from which the spin content can be read off. We extend this formalism to massive (super)particles (with N-extended space–time supersymmetry) in three and four space–time dimensions, explaining how the spin-shell constraints are related to spin, and we use it to prove equivalence of the massive N=1 and BPS-saturated N=2 superparticle actions. We also find the supertwistor form of the action for “spinning particles” with N-extended worldline supersymmetry, massless in four dimensions and massive in three dimensions, and we show how this simplifies special features of the N=2 case. -- Highlights: •Spin-shell constraints are related to Poincaré Casimirs. •Twistor form of 4D spinning particle for spin N/2. •Twistor proof of scalar/antisymmetric tensor equivalence for 4D spin 0. •Twistor form of 3D particle with arbitrary spin. •Proof of equivalence of N=1 and N=2 BPS massive 4D superparticles.

  11. Special parallel processing workshop

    SciTech Connect (OSTI)

    1994-12-01

    This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.

  12. Parallel Python GDB

    Energy Science and Technology Software Center (OSTI)

    2012-08-05

    PGDB is a lightweight parallel debugger softrware product. It utilizes the open souce gdb debugger inside of a parallel python framework.

  13. A Comprehensive Look at High Performance Parallel I/O

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    A Comprehensive Look at High Performance Parallel I/O A Comprehensive Look at High Performance Parallel I/O Book Signing @ SC14! Nov. 18, 5 p.m. in Booth 1939 November 10, 2014 Contact: Linda Vu, +1 510 495 2402, lvu@lbl.gov HighPerf Parallel IO In the 1990s, high performance computing (HPC) made a dramatic transition to massively parallel processors. As this model solidified over the next 20 years, supercomputing performance increased from gigaflops-billions of calculations per second-to

  14. Applications of Parallel Computers

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Computers Applications of Parallel Computers UCB CS267 Spring 2015 Tuesday & Thursday, 9:30-11:00 Pacific Time Applications of Parallel Computers, CS267, is a graduate-level course...

  15. Parallel flow diffusion battery

    DOE Patents [OSTI]

    Yeh, H.C.; Cheng, Y.S.

    1984-01-01

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  16. Parallel flow diffusion battery

    DOE Patents [OSTI]

    Yeh, Hsu-Chi; Cheng, Yung-Sung

    1984-08-07

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  17. Parallel Atomistic Simulations

    SciTech Connect (OSTI)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  18. Parallel integrated thermal management

    DOE Patents [OSTI]

    Bennion, Kevin; Thornton, Matthew

    2014-08-19

    Embodiments discussed herein are directed to managing the heat content of two vehicle subsystems through a single coolant loop having parallel branches for each subsystem.

  19. Optimize Parallel Pumping Systems

    Broader source: Energy.gov [DOE]

    This tip sheet describes how to optimize the performance of multiple pumps operating continuously as part of a parallel pumping system.

  20. Thought Leaders during Crises in Massive Social Networks

    SciTech Connect (OSTI)

    Corley, Courtney D.; Farber, Robert M.; Reynolds, William

    2012-05-24

    The vast amount of social media data that can be gathered from the internet coupled with workflows that utilize both commodity systems and massively parallel supercomputers, such as the Cray XMT, open new vistas for research to support health, defense, and national security. Computer technology now enables the analysis of graph structures containing more than 4 billion vertices joined by 34 billion edges along with metrics and massively parallel algorithms that exhibit near-linear scalability according to number of processors. The challenge lies in making this massive data and analysis comprehensible to an analyst and end-users that require actionable knowledge to carry out their duties. Simply stated, we have developed language and content agnostic techniques to reduce large graphs built from vast media corpora into forms people can understand. Specifically, our tools and metrics act as a survey tool to identify thought leaders' -- those members that lead or reflect the thoughts and opinions of an online community, independent of the source language.

  1. Eclipse Parallel Tools Platform

    Energy Science and Technology Software Center (OSTI)

    2005-02-18

    Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices,more » and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures

  2. Sinus histiocytosis with massive lymphadenopathy

    SciTech Connect (OSTI)

    Pastakia, B.; Weiss, S.H.

    1987-11-01

    Gallium uptake corresponding to the extent of the disease in a patient with histologically proven sinus histiocytosis with massive lymphadenopathy (SHML) is reported. Computerized tomography confirmed the presence of bilateral retrobulbar masses, involvement of both lateral recti, erosion of the bony orbital floor with encroachment of tumor into the right maxillary antrum, and retropharyngeal involvement.

  3. Parallel programming with PCN

    SciTech Connect (OSTI)

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  4. UPC (Unified Parallel C)

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    UPC (Unified Parallel C) Description Unified Parallel C is a partitioned global address space (PGAS) language and an extension of the C programming language. Availability UPC is available on Edison and Hopper via both the Cray compilers, as well as through Berkeley UPC, a portable high-performance UPC compiler and runtime implementation. Using UPC To compile a UPC source file using the Cray compilers, you must first swap the Cray compiler with the default compiler. On Hopper: % module swap

  5. A Massive Stellar Burst Before the Supernova

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    A Massive Stellar Burst Before the Supernova A Massive Stellar Burst Before the Supernova February 6, 2013 Contact: Linda Vu, lvu@lbl.gov, +1 510 495 2402 An automated supernova ...

  6. Strangeness production with massive'' gluons

    SciTech Connect (OSTI)

    Biro, T.S. Institut fuer Theoretische Physik, Justus-Liebig-Universitaet, Giessen ); Levai, P.; Mueller, B. )

    1990-11-01

    We present a perturbative calculation of strange-quark production by the processes {ital g}{r arrow}{ital s}+{ital {bar s}}, {ital g}+{ital g}{r arrow}{ital s}+{ital {bar s}}, and {ital q}+{ital {bar q}}{r arrow}{ital s}+{ital {bar s}} in a quark-gluon plasma containing gluons that are effectively massive'' due to medium effects. We consider only transverse polarizations of the gluons. We find that for a gluon mass beyond 300 MeV the one-gluon decay dominates, and that there is an {ital enhancement} of {ital s{bar s}} production from massive gluons compared with the massless case.

  7. An efficient parallel algorithm for matrix-vector multiplication

    SciTech Connect (OSTI)

    Hendrickson, B.; Leland, R.; Plimpton, S.

    1993-03-01

    The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in the well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.

  8. Parallel Tensor Compression for Large-Scale Scientific Data.

    SciTech Connect (OSTI)

    Kolda, Tamara G.; Ballard, Grey; Austin, Woody Nathan

    2015-10-01

    As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.

  9. Parallel time integration software

    Energy Science and Technology Software Center (OSTI)

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds mustmore » come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.« less

  10. Parallel community climate model: Description and user`s guide

    SciTech Connect (OSTI)

    Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H.

    1996-07-15

    This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.

  11. Parallel optical sampler

    DOE Patents [OSTI]

    Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A

    2014-05-20

    An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.

  12. Parallel programming with Ada

    SciTech Connect (OSTI)

    Kok, J.

    1988-01-01

    To the human programmer the ease of coding distributed computing is highly dependent on the suitability of the employed programming language. But with a particular language it is also important whether the possibilities of one or more parallel architectures can efficiently be addressed by available language constructs. In this paper the possibilities are discussed of the high-level language Ada and in particular of its tasking concept as a descriptional tool for the design and implementation of numerical and other algorithms that allow execution of parts in parallel. Language tools are explained and their use for common applications is shown. Conclusions are drawn about the usefulness of several Ada concepts.

  13. Parallel Multigrid Equation Solver

    Energy Science and Technology Software Center (OSTI)

    2001-09-07

    Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.

  14. Parallel programming with PCN

    SciTech Connect (OSTI)

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  15. Parallel Total Energy

    Energy Science and Technology Software Center (OSTI)

    2004-10-21

    This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.

  16. A Massive Stellar Burst Before the Supernova

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    A Massive Stellar Burst Before the Supernova A Massive Stellar Burst Before the Supernova February 6, 2013 Contact: Linda Vu, lvu@lbl.gov, +1 510 495 2402 An automated supernova hunt is shedding new light on the death sequence of massive stars-specifically, the kind that self-destruct in Type IIn supernova explosions. Digging through the Palomar Transient Factory (PTF) data archive housed at the Department of Energy's National Energy Research Scientific Computing Center (NERSC) at Lawrence

  17. Parallel grid population

    DOE Patents [OSTI]

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  18. Xyce parallel electronic simulator.

    SciTech Connect (OSTI)

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  19. Exploiting Network Parallelism

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Exploiting Network Parallelism for Improving Data Transfer Performance Dan Gunter ∗ , Raj Kettimuthu † , Ezra Kissel ‡ , Martin Swany ‡ , Jun Yi § , Jason Zurawski ¶ ∗ Advanced Computing for Science Department, Lawrence Berkeley National Laboratory, Berkeley, CA † Mathematics and Computer Science Division, Argonne National Laboratory Argonne, IL ‡ School of Informatics and Computing, Indiana University, Bloomington, IN § Computation Institute, University of Chicago/Argonne

  20. Detecting torsion from massive electrodynamics

    SciTech Connect (OSTI)

    Garcia de Andrade, L.C.; Lopes, M. )

    1993-11-01

    A new method of detecting torsion in the case of massive electrodynamics is proposed. The method is based on the study of spectral lines of hydrogen-like atoms placed in a torsion field, where the interaction energy between the torsion vector field Q and an electric dipole is given by [epsilon] [approximately] p [center dot] Q. All the methods designed so far have been based on spinning test particles interacting with magnetic fields in which the energy splitting is given by [epsilon] [approximately] S [center dot] B on a Stern-Gerlach type experiment. The authors arrive at an energy splitting of order of [epsilon] [approximately] 10[sup [minus]21]erg[approximately]10[sup [minus]9]eV, which is within the frequency band of radio waves. 15 refs.

  1. Hybrid Parallel Programming with MPI and Unified Parallel C | Argonne

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Leadership Computing Facility Parallel Programming with MPI and Unified Parallel C Authors: Dinan, J., Balaji, P., Lusk, E., Sadayappan, P., Thakur, R. The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity

  2. Allinea DDT as a Parallel Debugging Alternative to Totalview

    SciTech Connect (OSTI)

    Antypas, K.B.

    2007-03-05

    Totalview, from the Etnus Corporation, is a sophisticated and feature rich software debugger for parallel applications. As Totalview has gained in popularity and market share its pricing model has increased to the point where it is often prohibitively expensive for massively parallel supercomputers. Additionally, many of Totalview's advanced features are not used by members of the scientific computing community. For these reasons, supercomputing centers have begun to search for a basic parallel debugging tool which can be used as an alternative to Totalview. As the cost and complexity of Totalview has increased over the years, scientific computing centers have started searching for a viable parallel debugging alternative. DDT (Distributed Debugging Tool) from Allinea Software is a relatively new parallel debugging tool which aims to provide much of the same functionality as Totalview. This review outlines the basic features and limitations of DDT to determine if it can be a reasonable substitute for Totalview. DDT was tested on the NERSC platforms Bassi, Seaborg, Jacquard and Davinci with Fortran90, C, and C++ codes using MPI and OpenMP for parallelism.

  3. Applied Parallel Metadata Indexing

    SciTech Connect (OSTI)

    Jacobi, Michael R

    2012-08-01

    The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.

  4. Parallel ptychographic reconstruction

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-12-19

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps tomore »take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.« less

  5. Parallel ptychographic reconstruction

    SciTech Connect (OSTI)

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-12-19

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.

  6. Unified Parallel Software

    Energy Science and Technology Software Center (OSTI)

    2003-12-01

    UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use ofmore » EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.« less

  7. BlueGene/L Applications: Parallelism on a Massive Scale (Journal...

    Office of Scientific and Technical Information (OSTI)

    131,072 processors and absolute performance with a peak rate of 367 TFlops. BGL has led the Top500 list the last four times with a Linpack rate of 280.6 TFlops for the full ...

  8. CASL-U-2015-0170-000 SHIFT: A Massively Parallel Monte Carlo

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications ... results from the Babcock and Wilcox 1810 (BW1810) criticality experiments 18 are presented. ...

  9. A new quasidilaton theory of massive gravity (Journal Article...

    Office of Scientific and Technical Information (OSTI)

    A new quasidilaton theory of massive gravity Citation Details In-Document Search Title: A new quasidilaton theory of massive gravity We present a new quasidilaton theory of...

  10. Massive Hanford Test Reactor Removed - Plutonium Recycle Test...

    Office of Environmental Management (EM)

    Massive Hanford Test Reactor Removed - Plutonium Recycle Test Reactor removed from Hanford's 300 Area Massive Hanford Test Reactor Removed - Plutonium Recycle Test Reactor removed ...

  11. SDSS-III: Massive Spectroscopic Surveys of the Distant Universe...

    Office of Scientific and Technical Information (OSTI)

    Massive Spectroscopic Surveys of the Distant Universe, the Milky Way Galaxy, and Extra-Solar Planetary Systems Citation Details In-Document Search Title: SDSS-III: Massive...

  12. A cosmological study in massive gravity theory

    SciTech Connect (OSTI)

    Pan, Supriya Chakraborty, Subenoy

    2015-09-15

    A detailed study of the various cosmological aspects in massive gravity theory has been presented in the present work. For the homogeneous and isotropic FLRW model, the deceleration parameter has been evaluated, and, it has been examined whether there is any transition from deceleration to acceleration in recent past, or not. With the proper choice of the free parameters, it has been shown that the massive gravity theory is equivalent to Einstein gravity with a modified Newtonian gravitational constant together with a negative cosmological constant. Also, in this context, it has been examined whether the emergent scenario is possible, or not, in massive gravity theory. Finally, we have done a cosmographic analysis in massive gravity theory.

  13. Parallel Computing Summer Research Internship

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Recommended Reading & Resources Parallel Computing Summer Research Internship Creates next-generation leaders in HPC research and applications development Contacts Program Co-Lead ...

  14. Dynamic Star Formation in the Massive DR21 Filament

    SciTech Connect (OSTI)

    Schneider, N.; Csengeri, T.; Bontemps, S.; Motte, F.; Simon, R.; Hennebelle, P.; Federrath, C.; Klessen, R.; /ZAH, Heidelberg /KIPAC, Menlo Park

    2010-08-25

    The formation of massive stars is a highly complex process in which it is unclear whether the star-forming gas is in global gravitational collapse or an equilibrium state supported by turbulence and/or magnetic fields. By studying one of the most massive and dense star-forming regions in the Galaxy at a distance of less than 3 kpc, i.e. the filament containing the well-known sources DR21 and DR21(OH), we attempt to obtain observational evidence to help us to discriminate between these two views. We use molecular line data from our {sup 13}CO 1 {yields} 0, CS 2 {yields} 1, and N{sub 2}H{sup +} 1 {yields} 0 survey of the Cygnus X region obtained with the FCRAO and CO, CS, HCO{sup +}, N{sub 2}H{sup +}, and H{sub 2}CO data obtained with the IRAM 30m telescope. We observe a complex velocity field and velocity dispersion in the DR21 filament in which regions of the highest column-density, i.e., dense cores, have a lower velocity dispersion than the surrounding gas and velocity gradients that are not (only) due to rotation. Infall signatures in optically thick line profiles of HCO{sup +} and {sup 12}CO are observed along and across the whole DR21 filament. By modelling the observed spectra, we obtain a typical infall speed of {approx}0.6 km s{sup -1} and mass accretion rates of the order of a few 10{sup -3} M{sub {circle_dot}} yr{sup -1} for the two main clumps constituting the filament. These massive clumps (4900 and 3300 M{sub {circle_dot}} at densities of around 10{sup 5} cm{sup -3} within 1 pc diameter) are both gravitationally contracting. The more massive of the clumps, DR21(OH), is connected to a sub-filament, apparently 'falling' onto the clump. This filament runs parallel to the magnetic field. Conclusions. All observed kinematic features in the DR21 filament (velocity field, velocity dispersion, and infall), its filamentary morphology, and the existence of (a) sub-filament(s) can be explained if the DR21 filament was formed by the convergence of flows on large

  15. A two-level parallel direct search implementation for arbitrarily sized objective functions

    SciTech Connect (OSTI)

    Hutchinson, S.A.; Shadid, J.N.; Moffat, H.K.; Ng, K.T.

    1994-02-21

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine, itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p2) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  16. A two-level parallel direct search implementation for arbitrarily sized objective functions

    SciTech Connect (OSTI)

    Hutchinson, S.A.; Shadid, N.; Moffat, H.K.

    1994-12-31

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  17. Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P

    SciTech Connect (OSTI)

    Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Syratchev, I.; /CERN

    2009-06-19

    In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).

  18. Parallel processing for control applications

    SciTech Connect (OSTI)

    Telford, J. W.

    2001-01-01

    Parallel processing has been a topic of discussion in computer science circles for decades. Using more than one single computer to control a process has many advantages that compensate for the additional cost. Initially multiple computers were used to attain higher speeds. A single cpu could not perform all of the operations necessary for real time operation. As technology progressed and cpu's became faster, the speed issue became less significant. The additional processing capabilities however continue to make high speeds an attractive element of parallel processing. Another reason for multiple processors is reliability. For the purpose of this discussion, reliability and robustness will be the focal paint. Most contemporary conceptions of parallel processing include visions of hundreds of single computers networked to provide 'computing power'. Indeed our own teraflop machines are built from large numbers of computers configured in a network (and thus limited by the network). There are many approaches to parallel configfirations and this presentation offers something slightly different from the contemporary networked model. In the world of embedded computers, which is a pervasive force in contemporary computer controls, there are many single chip computers available. If one backs away from the PC based parallel computing model and considers the possibilities of a parallel control device based on multiple single chip computers, a new area of possibilities becomes apparent. This study will look at the use of multiple single chip computers in a parallel configuration with emphasis placed on maximum reliability.

  19. Primordial Li abundance and massive particles

    SciTech Connect (OSTI)

    Latin-Capital-Letter-Eth apo, H.

    2012-10-20

    The problem of the observed lithium abundance coming from the Big Bang Nucleosynthesis is as of yet unsolved. One of the proposed solutions is including relic massive particles into the Big Bang Nucleosynthesis. We investigated the effects of such particles on {sup 4}HeX{sup -}+{sup 2}H{yields}{sup 6}Li+X{sup -}, where the X{sup -} is the negatively charged massive particle. We demonstrate the dominance of long-range part of the potential on the cross-section.

  20. Growth histories in bimetric massive gravity

    SciTech Connect (OSTI)

    Berg, Marcus; Buchberger, Igor; Enander, Jonas; Mörtsell, Edvard; Sjörs, Stefan E-mail: igor.buchberger@kau.se E-mail: edvard@fysik.su.se

    2012-12-01

    We perform cosmological perturbation theory in Hassan-Rosen bimetric gravity for general homogeneous and isotropic backgrounds. In the de Sitter approximation, we obtain decoupled sets of massless and massive scalar gravitational fluctuations. Matter perturbations then evolve like in Einstein gravity. We perturb the future de Sitter regime by the ratio of matter to dark energy, producing quasi-de Sitter space. In this more general setting the massive and massless fluctuations mix. We argue that in the quasi-de Sitter regime, the growth of structure in bimetric gravity differs from that of Einstein gravity.

  1. NON-AQUEOUS DISSOLUTION OF MASSIVE PLUTONIUM

    DOE Patents [OSTI]

    Reavis, J.G.; Leary, J.A.; Walsh, K.A.

    1959-05-12

    A method is presented for obtaining non-aqueous solutions or plutonium from massive forms of the metal. In the present invention massive plutonium is added to a salt melt consisting of 10 to 40 weight per cent of sodium chloride and the balance zinc chloride. The plutonium reacts at about 800 deg C with the zinc chloride to form a salt bath of plutonium trichloride, sodium chloride, and metallic zinc. The zinc is separated from the salt melt by forcing the molten mixture through a Pyrex filter.

  2. Parallel Computing Summer Research Internship

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Recommended Reading & Resources Parallel Computing Summer Research Internship Creates next-generation leaders in HPC research and applications development Contacts Program Co-Lead Robert (Bob) Robey Email Program Co-Lead Gabriel Rockefeller Email Program Co-Lead Hai Ah Nam Email Professional Staff Assistant Nickole Aguilar Garcia (505) 665-3048 Email Recommended Reading & References The Parallel Computing Summer Research Internship covers a broad range of topics that you may not have

  3. Parallel Computing Summer Research Internship

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    LaboratoryNational Security Education Center Menu About Seminar Series Summer Schools Workshops Viz Collab IS&T Projects NSEC » Information Science and Technology Institute (ISTI) » Summer School Programs » Parallel Computing Parallel Computing Summer Research Internship Creates next-generation leaders in HPC research and applications development Contacts Program Co-Lead Robert (Bob) Robey Email Program Co-Lead Gabriel Rockefeller Email Program Co-Lead Hai Ah Nam Email Professional Staff

  4. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  5. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  6. Parallel Programming and Optimization for Intel Architecture

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel Programming and Optimization for Intel Architecture Parallel Programming and Optimization for Intel Architecture August 14, 2015 by Richard Gerber Intel is sponsoring a ...

  7. Computing contingency statistics in parallel.

    SciTech Connect (OSTI)

    Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

    2010-09-01

    Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.

  8. Scientists say climate change could cause a 'massive' tree die...

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Climate change could cause a 'massive' tree die-off in the U.S. Southwest Scientists say climate change could cause a 'massive' tree die-off in the U.S. Southwest In a troubling ...

  9. Materials Project Releases Massive Trove of Battery and Molecule...

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Materials Project Releases Massive Trove of Battery and Molecule Data Materials Project Releases Massive Trove of Battery and Molecule Data June 8, 2016 Julie Chao, JHChao@lbl.gov, ...

  10. Search for massive resonances in dijet systems containing jets...

    Office of Scientific and Technical Information (OSTI)

    massive resonances in dijet systems containing jets tagged as W or Z boson decays in pp collisions at ?s 8 TeV Re-direct Destination: Search for massive resonances in dijet...

  11. Monumental effort: How a dedicated team completed a massive beam...

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Plus One Share on Facebook An overhead crane lifts the massive box into the NSTX-U test cell (Photo by Mike Viola) An overhead crane lifts the massive box into the NSTX-U test ...

  12. Parallel partitioning strategies for the adaptive solution of conservation laws

    SciTech Connect (OSTI)

    Devine, K.D.; Flaherty, J.E.; Loy, R.M.

    1995-12-31

    We describe and examine the performance of adaptive methods for Solving hyperbolic systems of conservation laws on massively parallel computers. The differential system is approximated by a discontinuous Galerkin finite element method with a hierarchical Legendre piecewise polynomial basis for the spatial discretization. Fluxes at element boundaries are computed by solving an approximate Riemann problem; a projection limiter is applied to keep the average solution monotone; time discretization is performed by Runge-Kutta integration; and a p-refinement-based error estimate is used as an enrichment indicator. Adaptive order (p-) and mesh (h-) refinement algorithms are presented and demonstrated. Using an element-based dynamic load balancing algorithm called tiling and adaptive p-refinement, parallel efficiencies of over 60% are achieved on a 1024-processor nCUBE/2 hypercube. We also demonstrate a fast, tree-based parallel partitioning strategy for three-dimensional octree-structured meshes. This method produces partition quality comparable to recursive spectral bisection at a greatly reduced cost.

  13. Cosmology in general massive gravity theories

    SciTech Connect (OSTI)

    Comelli, D.; Nesti, F.; Pilo, L. E-mail: fabrizio.nesti@aquila.infn.it

    2014-05-01

    We study the cosmological FRW flat solutions generated in general massive gravity theories. Such a model are obtained adding to the Einstein General Relativity action a peculiar non derivative potentials, function of the metric components, that induce the propagation of five gravitational degrees of freedom. This large class of theories includes both the case with a residual Lorentz invariance as well as the case with rotational invariance only. It turns out that the Lorentz-breaking case is selected as the only possibility. Moreover it turns out that that perturbations around strict Minkowski or dS space are strongly coupled. The upshot is that even though dark energy can be simply accounted by massive gravity modifications, its equation of state w{sub eff} has to deviate from -1. Indeed, there is an explicit relation between the strong coupling scale of perturbations and the deviation of w{sub eff} from -1. Taking into account current limits on w{sub eff} and submillimiter tests of the Newton's law as a limit on the possible strong coupling scale, we find that it is still possible to have a weakly coupled theory in a quasi dS background. Future experimental improvements on short distance tests of the Newton's law may be used to tighten the deviation of w{sub eff} form -1 in a weakly coupled massive gravity theory.

  14. Designing a parallel simula machine

    SciTech Connect (OSTI)

    Papazoglou, M.P.; Georgiadis, P.I.; Maritsas, D.G.

    1983-10-01

    The parallel simula machine (PSM) architecture is based upon a master/slave topology, incorporating a master microprocessor. Interconnection circuitry between the master and slave processor modules uses a timesharing system bus and various programmable interrupt control units. Common and private memory modules reside in the PSM, and direct memory access transfers ease the master processor's workload. 5 references.

  15. Parallel, Distributed Scripting with Python

    SciTech Connect (OSTI)

    Miller, P J

    2002-05-24

    Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI library gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.

  16. Parallelization

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    ... that many intermediate calculations are done in place rather than saving values inmemory. ... many computer cores to leverage more computing power. 3 1.2 Magnetohydrodynamics A ...

  17. (The use of parallel computers and multiple scattering Green function methods in condensed matter physics)

    SciTech Connect (OSTI)

    Stocks, G.M.

    1990-11-30

    The traveler presented invited lectures Parallelizing the Multiple Scattering KKR and KKR-CPA Codes'' at a workshop on Parallel Codes and Algorithms for Electronic Structure of Solids,'' held at the Science and Engineering Research Council (SERC) Daresbury Laboratory, and SCF-KKR-CPA Calculations'' at a meeting on KKR'' and related scattering theory, held at the University of Bristol. The Daresbury meeting reviewed the use of massively parallel computers in condensed matter physics, an area in which ORNL is playing a leading role. The Bristol meeting highlighted the great progress that has been made in recent years in the first principles theory and calculation of the properties of materials based on multiple-scattering Green function methods. This is an area in which, historically, ORNL has had a strong presence. The traveler collaborated with scientists at SERC Daresbury Laboratory on the use of the massively parallel INTEL i860 supercomputer in the calculation of the electronic and ground state properties of alloys and high {Tc} superconductors. At the Universities of Warwick and Bristol, the traveler collaborated with Dr. J. B. Staunton and Prof. B. L. Gyorffy on spin, charge, and pairing fluctuations in the Hubbard model.

  18. Parallel multiplex laser feedback interferometry

    SciTech Connect (OSTI)

    Zhang, Song; Tan, Yidong; Zhang, Shulian

    2013-12-15

    We present a parallel multiplex laser feedback interferometer based on spatial multiplexing which avoids the signal crosstalk in the former feedback interferometer. The interferometer outputs two close parallel laser beams, whose frequencies are shifted by two acousto-optic modulators by 2Ω simultaneously. A static reference mirror is inserted into one of the optical paths as the reference optical path. The other beam impinges on the target as the measurement optical path. Phase variations of the two feedback laser beams are simultaneously measured through heterodyne demodulation with two different detectors. Their subtraction accurately reflects the target displacement. Under typical room conditions, experimental results show a resolution of 1.6 nm and accuracy of 7.8 nm within the range of 100 μm.

  19. Parallel Computing Summer Research Internship

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Mentors Parallel Computing Summer Research Internship Creates next-generation leaders in HPC research and applications development Contacts Program Co-Lead Robert (Bob) Robey Email Program Co-Lead Gabriel Rockefeller Email Program Co-Lead Hai Ah Nam Email Professional Staff Assistant Nickole Aguilar Garcia (505) 665-3048 Email 2016: Mentors Bob Robey Bob Robey XCP-2: EULERIAN CODES Bob Robey is a Research Scientist in the Eulerian Applications group at Los Alamos National Laboratory. He is the

  20. Parallel Computing Summer Research Internship

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Students Parallel Computing Summer Research Internship Creates next-generation leaders in HPC research and applications development Contacts Program Co-Lead Robert (Bob) Robey Email Program Co-Lead Gabriel Rockefeller Email Program Co-Lead Hai Ah Nam Email Professional Staff Assistant Nickole Aguilar Garcia (505) 665-3048 Email 2016: Students Peter Ahrens Peter Ahrens Electrical Engineering & Computer Science BS UC Berkeley Jenniffer Estrada Jenniffer Estrada Computer Science MS Youngstown

  1. Parallel Computing Summer Research Internship

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Guide to Los Alamos Parallel Computing Summer Research Internship Creates next-generation leaders in HPC research and applications development Contacts Program Co-Lead Robert (Bob) Robey Email Program Co-Lead Gabriel Rockefeller Email Program Co-Lead Hai Ah Nam Email Professional Staff Assistant Nickole Aguilar Garcia (505) 665-3048 Email Guide to Los Alamos During your 10-week internship, we hope you have the opportunity to explore and enjoy Los Alamos and the surrounding area. Here are some

  2. Parallel Power Grid Simulation Toolkit

    Energy Science and Technology Software Center (OSTI)

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  3. A new quasidilaton theory of massive gravity

    SciTech Connect (OSTI)

    Mukohyama, Shinji

    2014-12-01

    We present a new quasidilaton theory of Poincare invariant massive gravity, based on the recently proposed framework of matter coupling that makes it possible for the kinetic energy of the quasidilaton scalar to couple to both physical and fiducial metrics simultaneously. We find a scaling-type exact solution that expresses a self-accelerating de Sitter universe, and then analyze linear perturbations around it. It is shown that in a range of parameters all physical degrees of freedom have non-vanishing quadratic kinetic terms and are stable in the subhorizon limit, while the effective Newton's constant for the background is kept positive.

  4. Parallelization and checkpointing of GPU applications through program transformation

    SciTech Connect (OSTI)

    Solano-Quinde, Lizandro Dami#19;an

    2012-11-15

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have beneffited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solve the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism

  5. Parallel continuation-based global optimization for molecular conformation and protein folding

    SciTech Connect (OSTI)

    Coleman, T.F.; Wu, Z. [Cornell Univ., Ithaca, NY (United States)

    1994-12-31

    This paper presents the authors` recent work on developing parallel algorithms and software for solving the global minimization problem for molecular conformation, especially protein folding. Global minimization problems are difficult to solve when the objective functions have many local minimizers, such as the energy functions for protein folding. In their approach, to avoid directly minimizing a ``difficult`` function, a special integral transformation is introduced to transform the function into a class of gradually deformed, but ``smoother`` or ``easier`` functions. An optimization procedure is then applied to the new functions successively, to trace their solutions back to the original function. The method can be applied to a large class of nonlinear partially separable functions including energy functions for molecular conformation and protein folding. Mathematical theory for the method, as a special continuation approach to global optimization, is established. Algorithms with different solution tracing strategies are developed. Different levels of parallelism are exploited for the implementation of the algorithms on massively parallel architectures.

  6. METHYL CYANIDE OBSERVATIONS TOWARD MASSIVE PROTOSTARS

    SciTech Connect (OSTI)

    Rosero, V.; Hofner, P.; Kurtz, S.; Bieging, J.; Araya, E. D.

    2013-07-01

    We report the results of a survey in the CH{sub 3}CN J = 12 {yields} 11 transition toward a sample of massive proto-stellar candidates. The observations were carried out with the 10 m Submillimeter Telescope on Mount Graham, AZ. We detected this molecular line in 9 out of 21 observed sources. In six cases this is the first detection of this transition. We also obtained full beam sampled cross-scans for five sources which show that the lower K-components can be extended on the arcminute angular scale. The higher K-components, however, are always found to be compact with respect to our 36'' beam. A Boltzmann population diagram analysis of the central spectra indicates CH{sub 3}CN column densities of about 10{sup 14} cm{sup -2}, and rotational temperatures above 50 K, which confirms these sources as hot molecular cores. Independent fits to line velocity and width for the individual K-components resulted in the detection of an increasing blueshift with increasing line excitation for four sources. Comparison with mid-infrared (mid-IR) images from the SPITZER GLIMPSE/IRAC archive for six sources show that the CH{sub 3}CN emission is generally coincident with a bright mid-IR source. Our data clearly show that the CH{sub 3}CN J = 12 {yields} 11 transition is a good probe of the hot molecular gas near massive protostars, and provide the basis for future interferometric studies.

  7. Dipolar dark matter with massive bigravity

    SciTech Connect (OSTI)

    Blanchet, Luc; Heisenberg, Lavinia

    2015-12-14

    Massive gravity theories have been developed as viable IR modifications of gravity motivated by dark energy and the problem of the cosmological constant. On the other hand, modified gravity and modified dark matter theories were developed with the aim of solving the problems of standard cold dark matter at galactic scales. Here we propose to adapt the framework of ghost-free massive bigravity theories to reformulate the problem of dark matter at galactic scales. We investigate a promising alternative to dark matter called dipolar dark matter (DDM) in which two different species of dark matter are separately coupled to the two metrics of bigravity and are linked together by an internal vector field. We show that this model successfully reproduces the phenomenology of dark matter at galactic scales (i.e. MOND) as a result of a mechanism of gravitational polarisation. The model is safe in the gravitational sector, but because of the particular couplings of the matter fields and vector field to the metrics, a ghost in the decoupling limit is present in the dark matter sector. However, it might be possible to push the mass of the ghost beyond the strong coupling scale by an appropriate choice of the parameters of the model. Crucial questions to address in future work are the exact mass of the ghost, and the cosmological implications of the model.

  8. Knowledge Discovery from Massive Healthcare Claims Data

    SciTech Connect (OSTI)

    Chandola, Varun; Sukumar, Sreenivas R; Schryver, Jack C

    2013-01-01

    The role of big data in addressing the needs of the present healthcare system in US and rest of the world has been echoed by government, private, and academic sectors. There has been a growing emphasis to explore the promise of big data analytics in tapping the potential of the massive healthcare data emanating from private and government health insurance providers. While the domain implications of such collaboration are well known, this type of data has been explored to a limited extent in the data mining community. The objective of this paper is two fold: first, we introduce the emerging domain of big"healthcare claims data to the KDD community, and second, we describe the success and challenges that we encountered in analyzing this data using state of art analytics for massive data. Specically, we translate the problem of analyzing healthcare data into some of the most well-known analysis problems in the data mining community, social network analysis, text mining, and temporal analysis and higher order feature construction, and describe how advances within each of these areas can be leveraged to understand the domain of healthcare. Each case study illustrates a unique intersection of data mining and healthcare with a common objective of improving the cost-care ratio by mining for opportunities to improve healthcare operations and reducing hat seems to fall under fraud, waste,and abuse.

  9. Performance analysis of high quality parallel preconditioners applied to 3D finite element structural analysis

    SciTech Connect (OSTI)

    Kolotilina, L.; Nikishin, A.; Yeremin, A.

    1994-12-31

    The solution of large systems of linear equations is a crucial bottleneck when performing 3D finite element analysis of structures. Also, in many cases the reliability and robustness of iterative solution strategies, and their efficiency when exploiting hardware resources, fully determine the scope of industrial applications which can be solved on a particular computer platform. This is especially true for modern vector/parallel supercomputers with large vector length and for modern massively parallel supercomputers. Preconditioned iterative methods have been successfully applied to industrial class finite element analysis of structures. The construction and application of high quality preconditioners constitutes a high percentage of the total solution time. Parallel implementation of high quality preconditioners on such architectures is a formidable challenge. Two common types of existing preconditioners are the implicit preconditioners and the explicit preconditioners. The implicit preconditioners (e.g. incomplete factorizations of several types) are generally high quality but require solution of lower and upper triangular systems of equations per iteration which are difficult to parallelize without deteriorating the convergence rate. The explicit type of preconditionings (e.g. polynomial preconditioners or Jacobi-like preconditioners) require sparse matrix-vector multiplications and can be parallelized but their preconditioning qualities are less than desirable. The authors present results of numerical experiments with Factorized Sparse Approximate Inverses (FSAI) for symmetric positive definite linear systems. These are high quality preconditioners that possess a large resource of parallelism by construction without increasing the serial complexity.

  10. Scalable parallel distance field construction for large-scale applications

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Yu, Hongfeng; Xie, Jinrong; Ma, Kwan -Liu; Kolla, Hemanth; Chen, Jacqueline H.

    2015-10-01

    Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. Anew distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking overtime, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate itsmore » efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. In conclusion, our work greatly extends the usability of distance fields for demanding applications.« less

  11. Scalable Parallel Algebraic Multigrid Solvers

    SciTech Connect (OSTI)

    Bank, R; Lu, S; Tong, C; Vassilevski, P

    2005-03-23

    The authors propose a parallel algebraic multilevel algorithm (AMG), which has the novel feature that the subproblem residing in each processor is defined over the entire partition domain, although the vast majority of unknowns for each subproblem are associated with the partition owned by the corresponding processor. This feature ensures that a global coarse description of the problem is contained within each of the subproblems. The advantages of this approach are that interprocessor communication is minimized in the solution process while an optimal order of convergence rate is preserved; and the speed of local subproblem solvers can be maximized using the best existing sequential algebraic solvers.

  12. A Novel Application of Parallel Betweenness Centrality to Power Grid Contingency Analysis

    SciTech Connect (OSTI)

    Jin, Shuangshuang; Huang, Zhenyu; Chen, Yousu; Chavarría-Miranda, Daniel; Feo, John T.; Wong, Pak C.

    2010-04-19

    In Energy Management Systems, contingency analysis is commonly performed for identifying and mitigating potentially harmful power grid component failures. The exponentially increasing combinatorial number of failure modes imposes a significant computational burden for massive contingency analysis. It is critical to select a limited set of high-impact contingency cases within the constraint of computing power and time requirements to make it possible for real-time power system vulnerability assessment. In this paper, we present a novel application of parallel betweenness centrality to power grid contingency selection. We cross-validate the proposed method using the model and data of the western US power grid, and implement it on a Cray XMT system - a massively multithreaded architecture - leveraging its advantages for parallel execution of irregular algorithms, such as graph analysis. We achieve a speedup of 55 times (on 64 processors) compared against the single-processor version of the same code running on the Cray XMT. We also compare an OpenMP-based version of the same code running on an HP Superdome shared-memory machine. The performance of the Cray XMT code shows better scalability and resource utilization, and shorter execution time for large-scale power grids. This proposed approach has been evaluated in PNNL’s Electricity Infrastructure Operations Center (EIOC). It is expected to provide a quick and efficient solution to massive contingency selection problems to help power grid operators to identify and mitigate potential widespread cascading power grid failures in real time.

  13. Redshift-space distortions in massive neutrino and evolving dark...

    Office of Scientific and Technical Information (OSTI)

    Redshift-space distortions in massive neutrino and evolving dark energy cosmologies ... This content will become publicly available on March 16, 2017 Title: Redshift-space ...

  14. On the massive transformation from ferrite to austenite in laser...

    Office of Scientific and Technical Information (OSTI)

    austenite in laser welded mo-bearing stainless steels. Citation Details In-Document Search Title: On the massive transformation from ferrite to austenite in laser welded ...

  15. Search for massive WH resonances decaying into the $$\\ell \

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Khachatryan, Vardan

    2016-04-28

    In this study, a search for a massive resonancemore » $${\\mathrm{W}^{\\prime }}$$ decaying into a W and a Higgs boson in the $$\\ell \

  16. Xyce parallel electronic simulator design.

    SciTech Connect (OSTI)

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  17. Hybrid Optimization Parallel Search PACKage

    Energy Science and Technology Software Center (OSTI)

    2009-11-10

    HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework providesmore » a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.« less

  18. Device for balancing parallel strings

    DOE Patents [OSTI]

    Mashikian, Matthew S.

    1985-01-01

    A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.

  19. Information hiding in parallel programs

    SciTech Connect (OSTI)

    Foster, I.

    1992-01-30

    A fundamental principle in program design is to isolate difficult or changeable design decisions. Application of this principle to parallel programs requires identification of decisions that are difficult or subject to change, and the development of techniques for hiding these decisions. We experiment with three complex applications, and identify mapping, communication, and scheduling as areas in which decisions are particularly problematic. We develop computational abstractions that hide such decisions, and show that these abstractions can be used to develop elegant solutions to programming problems. In particular, they allow us to encode common structures, such as transforms, reductions, and meshes, as software cells and templates that can reused in different applications. An important characteristic of these structures is that they do not incorporate mapping, communication, or scheduling decisions: these aspects of the design are specified separately, when composing existing structures to form applications. This separation of concerns allows the same cells and templates to be reused in different contexts.

  20. Parallel computing in enterprise modeling.

    SciTech Connect (OSTI)

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  1. WCH Removes Massive Test Reactor | Department of Energy

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    WCH Removes Massive Test Reactor WCH Removes Massive Test Reactor Addthis Description Hanford's River Corridor contractor, Washington Closure Hanford, has met a significant cleanup challenge on the U.S. Department of Energy's (DOE) Hanford Site by removing a 1,082-ton nuclear test reactor from the 300 Area.

  2. The design of a parallel adaptive paving all-quadrilateral meshing algorithm

    SciTech Connect (OSTI)

    Tautges, T.J.; Lober, R.R.; Vaughan, C.

    1995-08-01

    Adaptive finite element analysis demands a great deal of computational resources, and as such is most appropriately solved in a massively parallel computer environment. This analysis will require other parallel algorithms before it can fully utilize MP computers, one of which is parallel adaptive meshing. A version of the paving algorithm is being designed which operates in parallel but which also retains the robustness and other desirable features present in the serial algorithm. Adaptive paving in a production mode is demonstrated using a Babuska-Rheinboldt error estimator on a classic linearly elastic plate problem. The design of the parallel paving algorithm is described, and is based on the decomposition of a surface into {open_quotes}virtual{close_quotes} surfaces. The topology of the virtual surface boundaries is defined using mesh entities (mesh nodes and edges) so as to allow movement of these boundaries with smoothing and other operations. This arrangement allows the use of the standard paving algorithm on subdomain interiors, after the negotiation of the boundary mesh.

  3. MACHO (MAssive Compact Halo Objects) Data

    DOE Data Explorer [Office of Scientific and Technical Information (OSTI)]

    The primary aim of the MACHO Project is to test the hypothesis that a significant fraction of the dark matter in the halo of the Milky Way is made up of objects like brown dwarfs or planets: these objects have come to be known as MACHOs, for MAssive Compact Halo Objects. The signature of these objects is the occasional amplification of the light from extragalactic stars by the gravitational lens effect. The amplification can be large, but events are extremely rare: it is necessary to monitor photometrically several million stars for a period of years in order to obtain a useful detection rate. For this purpose MACHO has a two channel system that employs eight CCDs, mounted on the 50 inch telescope at Mt. Stromlo. The high data rate (several GBytes per night) is accommodated by custom electronics and on-line data reduction. The Project has taken more than 27,000 images with this system since June 1992. Analysis of a subset of these data has yielded databases containing light curves in two colors for 8 million stars in the LMC and 10 million in the bulge of the Milky Way. A search for microlensing has turned up four candidates toward the Large Magellanic Cloud and 45 toward the Galactic Bulge. The web page for data provides links to MACHO Project data portals and various specialized interfaces for viewing or searching the data. (Specialized Interface)

  4. Dark aspects of massive spinor electrodynamics

    SciTech Connect (OSTI)

    Kim, Edward J.; Kouwn, Seyen; Oh, Phillial; Park, Chan-Gyung E-mail: seyen@ewha.ac.kr E-mail: parkc@jbnu.ac.kr

    2014-07-01

    We investigate the cosmology of massive spinor electrodynamics when torsion is non-vanishing. A non-minimal interaction is introduced between the torsion and the vector field and the coupling constant between them plays an important role in subsequential cosmology. It is shown that the mass of the vector field and torsion conspire to generate dark energy and pressureless dark matter, and for generic values of the coupling constant, the theory effectively provides an interacting model between them with an additional energy density of the form ? 1/a{sup 6}. The evolution equations mimic ?CDM behavior up to 1/a{sup 3} term and the additional term represents a deviation from ?CDM. We show that the deviation is compatible with the observational data, if it is very small. We find that the non-minimal interaction is responsible for generating an effective cosmological constant which is directly proportional to the mass squared of the vector field and the mass of the photon within its current observational limit could be the source of the dark energy.

  5. Parallel Programming and Optimization for Intel Architecture

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel Programming and Optimization for Intel Architecture Parallel Programming and Optimization for Intel Architecture August 14, 2015 by Richard Gerber Intel is sponsoring a series of webinars entitled "Parallel Programming and Optimization for Intel Architecture." Here's the schedule for August (Registration link is: https://attendee.gotowebinar.com/register/6325131222429932289) Mon, August 17 - "Hello world from Intel Xeon Phi coprocessors". Overview of architecture,

  6. CASL - The Michigan Parallel Characteristics Transport Code

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    The Michigan Parallel Characteristics Transport Code Verification of MPACT: The Michigan Parallel Characteristics Transport Code Benjamin Collins, Brendan Kochunas, Daniel Jabbay, Thomas Downar, William Martin Department of Nuclear Engineering and Radiological Sciences University of Michigan Andrew Godfrey Oak Ridge National Laboroatory MPACT (Michigan PArallel Characteristics Transport Code) is a new reactor analysis tool being developed at the University of Michigan as an advanced pin-resolved

  7. Parallel auto-correlative statistics with VTK.

    SciTech Connect (OSTI)

    Pebay, Philippe Pierre; Bennett, Janine Camille

    2013-08-01

    This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.

  8. Large N phase transitions in massive N = 2 gauge theories

    SciTech Connect (OSTI)

    Russo, J. G.

    2014-07-23

    Using exact results obtained from localization on S{sup 4}, we explore the large N limit of N = 2 super Yang-Mills theories with massive matter multiplets. In this talk we discuss two cases: N = 2* theory, describing a massive hypermultiplet in the adjoint representation, and super QCD with massive quarks. When the radius of the four-sphere is sent to infinity these theories are described by solvable matrix models, which exhibit a number of interesting phenomena including quantum phase transitions at finite 't Hooft coupling.

  9. SEGUE 2: THE LEAST MASSIVE GALAXY

    SciTech Connect (OSTI)

    Kirby, Evan N.; Boylan-Kolchin, Michael; Bullock, James S.; Kaplinghat, Manoj; Cohen, Judith G.; Geha, Marla

    2013-06-10

    Segue 2, discovered by Belokurov et al., is a galaxy with a luminosity of only 900 L{sub Sun }. We present Keck/DEIMOS spectroscopy of 25 members of Segue 2-a threefold increase in spectroscopic sample size. The velocity dispersion is too small to be measured with our data. The upper limit with 90% (95%) confidence is {sigma}{sub v} < 2.2 (2.6) km s{sup -1}, the most stringent limit for any galaxy. The corresponding limit on the mass within the three-dimensional half-light radius (46 pc) is M{sub 1/2} < 1.5 (2.1) Multiplication-Sign 10{sup 5} M{sub Sun }. Segue 2 is the least massive galaxy known. We identify Segue 2 as a galaxy rather than a star cluster based on the wide dispersion in [Fe/H] (from -2.85 to -1.33) among the member stars. The stars' [{alpha}/Fe] ratios decline with increasing [Fe/H], indicating that Segue 2 retained Type Ia supernova ejecta despite its presently small mass and that star formation lasted for at least 100 Myr. The mean metallicity, ([Fe/H]) = -2.22 {+-} 0.13 (about the same as the Ursa Minor galaxy, 330 times more luminous than Segue 2), is higher than expected from the luminosity-metallicity relation defined by more luminous dwarf galaxy satellites of the Milky Way. Segue 2 may be the barest remnant of a tidally stripped, Ursa Minor-sized galaxy. If so, it is the best example of an ultra-faint dwarf galaxy that came to be ultra-faint through tidal stripping. Alternatively, Segue 2 could have been born in a very low mass dark matter subhalo (v{sub max} < 10 km s{sup -1}), below the atomic hydrogen cooling limit.

  10. Parallel 3D Finite Element Particle-in-Cell Simulations with Pic3P

    SciTech Connect (OSTI)

    Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Ben-Zvi, I.; Kewisch, J.; /Brookhaven

    2009-06-19

    SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic Particle-In-Cell code Pic3P. Designed for simulations of beam-cavity interactions dominated by space charge effects, Pic3P solves the complete set of Maxwell-Lorentz equations self-consistently and includes space-charge, retardation and boundary effects from first principles. Higher-order Finite Element methods with adaptive refinement on conformal unstructured meshes lead to highly efficient use of computational resources. Massively parallel processing with dynamic load balancing enables large-scale modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of next-generation accelerator facilities. Applications include the LCLS RF gun and the BNL polarized SRF gun.

  11. Broadcasting a message in a parallel computer

    DOE Patents [OSTI]

    Berg, Jeremy E.; Faraj, Ahmad A.

    2011-08-02

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.

  12. Secretary Chu Announces New Institute to Help Scientists Improve Massive

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    Data Set Research on DOE Supercomputers | Department of Energy Institute to Help Scientists Improve Massive Data Set Research on DOE Supercomputers Secretary Chu Announces New Institute to Help Scientists Improve Massive Data Set Research on DOE Supercomputers March 29, 2012 - 2:48pm Addthis Washington D.C. - Energy Secretary Steven Chu today announced $5 million to establish the Scalable Data Management, Analysis and Visualization (SDAV) Institute as part of the Obama Administration's

  13. Massive Hanford Test Reactor Removed - Plutonium Recycle Test Reactor

    Office of Environmental Management (EM)

    removed from Hanford's 300 Area | Department of Energy Massive Hanford Test Reactor Removed - Plutonium Recycle Test Reactor removed from Hanford's 300 Area Massive Hanford Test Reactor Removed - Plutonium Recycle Test Reactor removed from Hanford's 300 Area January 22, 2014 - 12:00pm Addthis Media Contacts Cameron Hardy, DOE 509-376-5365 Cameron.Hardy@re.doe.gov Mark McKenna, Washington Closure 509-372-9032 media@wch-rcc.com RICHLAND, WA - Hanford's River Corridor contractor, Washington

  14. Materials Project Releases Massive Trove of Battery and Molecule Data

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Materials Project Releases Massive Trove of Battery and Molecule Data Materials Project Releases Massive Trove of Battery and Molecule Data June 8, 2016 Julie Chao, JHChao@lbl.gov, (510) 486-6491 materialsproject2 Screen shot from the Materials Project website. The Materials Project, a Google-like database of material properties aimed at accelerating innovation, has released an enormous trove of data to the public, giving scientists working on fuel cells, photovoltaics, thermoelectrics and a

  15. Protecting Recovery Act Cleanup Site During Massive Wildfire | Department

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    of Energy Protecting Recovery Act Cleanup Site During Massive Wildfire Protecting Recovery Act Cleanup Site During Massive Wildfire Effective safety procedures in place at Los Alamos National Laboratory would have provided protections in the event that the raging Las Conchas fire had spread to the site of an American Recovery and Reinvestment Act project. "Our procedures not only placed the waste excavation site, Materials Disposal Area B (MDA-B), into a safe posture so it was well

  16. New approach for the solution of optimal control problems on parallel machines. Doctoral thesis

    SciTech Connect (OSTI)

    Stech, D.J.

    1990-01-01

    This thesis develops a highly parallel solution method for nonlinear optimal control problems. Balakrishnan's epsilon method is used in conjunction with the Rayleigh-Ritz method to convert the dynamic optimization of the optimal control problem into a static optimization problem. Walsh functions and orthogonal polynomials are used as basis functions to implement the Rayleigh-Ritz method. The resulting static optimization problem is solved using matrix operations which have well defined massively parallel solution methods. To demonstrate the method, a variety of nonlinear optimal control problems are solved. The nonlinear Raleigh problem with quadratic cost and nonlinear van der Pol problem with quadratic cost and terminal constraints on the states are solved in both serial and parallel on an eight processor Intel Hypercube. The solutions using both Walsh functions and Legendre polynomials as basis functions are given. In addition to these problems which are solved in parallel, a more complex nonlinear minimum time optimal control problem and nonlinear optimal control problem with an inequality constraint on the control are solved. Results show the method converges quickly, even from relatively poor initial guesses for the nominal trajectories.

  17. Massively Multi-core Acceleration of a Document-Similarity Classifier to Detect Web Attacks

    SciTech Connect (OSTI)

    Ulmer, C; Gokhale, M; Top, P; Gallagher, B; Eliassi-Rad, T

    2010-01-14

    This paper describes our approach to adapting a text document similarity classifier based on the Term Frequency Inverse Document Frequency (TFIDF) metric to two massively multi-core hardware platforms. The TFIDF classifier is used to detect web attacks in HTTP data. In our parallel hardware approaches, we design streaming, real time classifiers by simplifying the sequential algorithm and manipulating the classifier's model to allow decision information to be represented compactly. Parallel implementations on the Tilera 64-core System on Chip and the Xilinx Virtex 5-LX FPGA are presented. For the Tilera, we employ a reduced state machine to recognize dictionary terms without requiring explicit tokenization, and achieve throughput of 37MB/s at slightly reduced accuracy. For the FPGA, we have developed a set of software tools to help automate the process of converting training data to synthesizable hardware and to provide a means of trading off between accuracy and resource utilization. The Xilinx Virtex 5-LX implementation requires 0.2% of the memory used by the original algorithm. At 166MB/s (80X the software) the hardware implementation is able to achieve Gigabit network throughput at the same accuracy as the original algorithm.

  18. System and method for a parallel immunoassay system

    DOE Patents [OSTI]

    Stevens, Fred J.

    2002-01-01

    A method and system for detecting a target antigen using massively parallel immunoassay technology. In this system, high affinity antibodies of the antigen are covalently linked to small beads or particles. The beads are exposed to a solution containing DNA-oligomer-mimics of the antigen. The mimics which are reactive with the covalently attached antibody or antibodies will bind to the appropriate antibody molecule on the bead. The particles or beads are then washed to remove any unbound DNA-oligomer-mimics and are then immobilized or trapped. The bead-antibody complexes are then exposed to a test solution which may contain the targeted antigens. If the antigen is present it will replace the mimic since it has a greater affinity for the respective antibody. The particles are then removed from the solution leaving a residual solution. This residual solution is applied a DNA chip containing many samples of complimentary DNA. If the DNA tag from a mimic binds with its complimentary DNA, it indicates the presence of the target antigen. A flourescent tag can be used to more easily identify the bound DNA tag.

  19. Xyce parallel electronic simulator : users' guide.

    SciTech Connect (OSTI)

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique

  20. Adding Data Management Services to Parallel File Systems

    SciTech Connect (OSTI)

    Brandt, Scott

    2015-03-04

    The objective of this project, called DAMASC for “Data Management in Scientific Computing”, is to coalesce data management with parallel file system management to present a declarative interface to scientists for managing, querying, and analyzing extremely large data sets efficiently and predictably. Managing extremely large data sets is a key challenge of exascale computing. The overhead, energy, and cost of moving massive volumes of data demand designs where computation is close to storage. In current architectures, compute/analysis clusters access data in a physically separate parallel file system and largely leave it scientist to reduce data movement. Over the past decades the high-end computing community has adopted middleware with multiple layers of abstractions and specialized file formats such as NetCDF-4 and HDF5. These abstractions provide a limited set of high-level data processing functions, but have inherent functionality and performance limitations: middleware that provides access to the highly structured contents of scientific data files stored in the (unstructured) file systems can only optimize to the extent that file system interfaces permit; the highly structured formats of these files often impedes native file system performance optimizations. We are developing Damasc, an enhanced high-performance file system with native rich data management services. Damasc will enable efficient queries and updates over files stored in their native byte-stream format while retaining the inherent performance of file system data storage via declarative queries and updates over views of underlying files. Damasc has four key benefits for the development of data-intensive scientific code: (1) applications can use important data-management services, such as declarative queries, views, and provenance tracking, that are currently available only within database systems; (2) the use of these services becomes easier, as they are provided within a familiar file

  1. Paradyn a parallel nonlinear, explicit, three-dimensional finite-element code for solid and structural mechanics user manual

    SciTech Connect (OSTI)

    Hoover, C G; DeGroot, A J; Sherwood, R J

    2000-06-01

    ParaDyn is a parallel version of the DYNA3D computer program, a three-dimensional explicit finite-element program for analyzing the dynamic response of solids and structures. The ParaDyn program has been used as a production tool for over three years for analyzing problems which range in size from a few tens of thousands of elements to between one-million and ten-million elements. ParaDyn runs on parallel computers provided by the Department of Energy Accelerated Strategic Computing Initiative (ASCI) and the Department of Defense High Performance Computing and Modernization Program. Preprocessing and post-processing software utilities and tools are designed to facilitate the generation of partitioned domains for processors on a massively parallel computer and the visualization of both resultant data and boundary data generated in a parallel simulation. This manual provides a brief overview of the parallel implementation; describes techniques for running the ParaDyn program, tools and utilities; and provides examples of parallel simulations.

  2. Parallel Climate Analysis Toolkit (ParCAT)

    SciTech Connect (OSTI)

    Smith, Brian Edward

    2013-06-30

    The parallel analysis toolkit (ParCAT) provides parallel statistical processing of large climate model simulation datasets. ParCAT provides parallel point-wise average calculations, frequency distributions, sum/differences of two datasets, and difference-of-average and average-of-difference for two datasets for arbitrary subsets of simulation time. ParCAT is a command-line utility that can be easily integrated in scripts or embedded in other application. ParCAT supports CMIP5 post-processed datasets as well as non-CMIP5 post-processed datasets. ParCAT reads and writes standard netCDF files.

  3. Distributed parallel messaging for multiprocessor systems

    DOE Patents [OSTI]

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  4. Feature Clustering for Accelerating Parallel Coordinate Descent

    SciTech Connect (OSTI)

    Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.

    2012-12-06

    We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.

  5. Parallel I/O in Practice

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    art. This tutorial sheds light on the state-of-the-art in parallel IO and provides the knowledge necessary for attendees to best leverage IO resources available to them. We...

  6. Asynchronous parallel pattern search for nonlinear optimization

    SciTech Connect (OSTI)

    P. D. Hough; T. G. Kolda; V. J. Torczon

    2000-01-01

    Parallel pattern search (PPS) can be quite useful for engineering optimization problems characterized by a small number of variables (say 10--50) and by expensive objective function evaluations such as complex simulations that take from minutes to hours to run. However, PPS, which was originally designed for execution on homogeneous and tightly-coupled parallel machine, is not well suited to the more heterogeneous, loosely-coupled, and even fault-prone parallel systems available today. Specifically, PPS is hindered by synchronization penalties and cannot recover in the event of a failure. The authors introduce a new asynchronous and fault tolerant parallel pattern search (AAPS) method and demonstrate its effectiveness on both simple test problems as well as some engineering optimization problems

  7. Parallel programming with PCN. Revision 1

    SciTech Connect (OSTI)

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  8. PISTON (Portable Data Parallel Visualization and Analysis)

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    in a data-parallel way. By using nVidia's freely downloadable Thrust library and our own tools, we can generate executable codes for different acceleration hardware architectures...

  9. Optimize Parallel Pumping Systems: Industrial Technologies Program...

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    ... to operate the number of pumps needed to meet variable fow rate requirements effciently. ... Parallel pumps provide balanced or equal fow rates when the same models are used and their ...

  10. HOPSPACK: Hybrid Optimization Parallel Search Package.

    SciTech Connect (OSTI)

    Gray, Genetha A.; Kolda, Tamara G.; Griffin, Joshua; Taddy, Matt; Martinez-Canales, Monica

    2008-12-01

    In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4

  11. Cosmological stability bound in massive gravity and bigravity

    SciTech Connect (OSTI)

    Fasiello, Matteo; Tolley, Andrew J. E-mail: andrew.j.tolley@case.edu

    2013-12-01

    We give a simple derivation of a cosmological bound on the graviton mass for spatially flat FRW solutions in massive gravity with an FRW reference metric and for bigravity theories. This bound comes from the requirement that the kinetic term of the helicity zero mode of the graviton is positive definite. The bound is dependent only on the parameters in the massive gravity potential and the Hubble expansion rate for the two metrics. We derive the decoupling limit of bigravity and FRW massive gravity, and use this to give an independent derivation of the cosmological bound. We recover our previous results that the tension between satisfying the Friedmann equation and the cosmological bound is sufficient to rule out all observationally relevant FRW solutions for massive gravity with an FRW reference metric. In contrast, in bigravity this tension is resolved due to different nature of the Vainshtein mechanism. We find that in bigravity theories there exists an FRW solution with late-time self-acceleration for which the kinetic terms for the helicity-2, helicity-1 and helicity-0 are generically nonzero and positive making this a compelling candidate for a model of cosmic acceleration. We confirm that the generalized bound is saturated for the candidate partially massless (bi)gravity theories but the existence of helicity-1/helicity-0 interactions implies the absence of the conjectured partially massless symmetry for both massive gravity and bigravity.

  12. Spontaneous Lorentz and diffeomorphism violation, massive modes, and gravity

    SciTech Connect (OSTI)

    Bluhm, Robert; Fung Shuhong; Kostelecky, V. Alan

    2008-03-15

    Theories with spontaneous local Lorentz and diffeomorphism violation contain massless Nambu-Goldstone modes, which arise as field excitations in the minimum of the symmetry-breaking potential. If the shape of the potential also allows excitations above the minimum, then an alternative gravitational Higgs mechanism can occur in which massive modes involving the metric appear. The origin and basic properties of the massive modes are addressed in the general context involving an arbitrary tensor vacuum value. Special attention is given to the case of bumblebee models, which are gravitationally coupled vector theories with spontaneous local Lorentz and diffeomorphism violation. Mode expansions are presented in both local and spacetime frames, revealing the Nambu-Goldstone and massive modes via decomposition of the metric and bumblebee fields, and the associated symmetry properties and gauge fixing are discussed. The class of bumblebee models with kinetic terms of the Maxwell form is used as a focus for more detailed study. The nature of the associated conservation laws and the interpretation as a candidate alternative to Einstein-Maxwell theory are investigated. Explicit examples involving smooth and Lagrange-multiplier potentials are studied to illustrate features of the massive modes, including their origin, nature, dispersion laws, and effects on gravitational interactions. In the weak static limit, the massive mode and Lagrange-multiplier fields are found to modify the Newton and Coulomb potentials. The nature and implications of these modifications are examined.

  13. Parallel phase model : a programming model for high-end parallel machines with manycores.

    SciTech Connect (OSTI)

    Wu, Junfeng; Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

    2009-04-01

    This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.

  14. Translation invariant time-dependent solutions to massive gravity

    SciTech Connect (OSTI)

    Mourad, J.; Steer, D.A. E-mail: steer@apc.univ-paris7.fr

    2013-12-01

    Homogeneous time-dependent solutions of massive gravity generalise the plane wave solutions of the linearised Fierz-Pauli equations for a massive spin-two particle, as well as the Kasner solutions of General Relativity. We show that they also allow a clear counting of the degrees of freedom and represent a simplified framework to work out the constraints, the equations of motion and the initial value formulation. We work in the vielbein formulation of massive gravity, find the phase space resulting from the constraints and show that several disconnected sectors of solutions exist some of which are unstable. The initial values determine the sector to which a solution belongs. Classically, the theory is not pathological but quantum mechanically the theory may suffer from instabilities. The latter are not due to an extra ghost-like degree of freedom.

  15. Massive Stars in Colliding Wind Systems: the GLAST Perspective

    SciTech Connect (OSTI)

    Reimer, Anita; Reimer, Olaf; /Stanford U., HEPL /KIPAC, Menlo Park

    2011-11-29

    Colliding winds of massive stars in binary systems are considered as candidate sites of high-energy non-thermal photon emission. They are already among the suggested counterparts for a few individual unidentified EGRET sources, but may constitute a detectable source population for the GLAST observatory. The present work investigates such population study of massive colliding wind systems at high-energy gamma-rays. Based on the recent detailed model (Reimer et al. 2006) for non-thermal photon production in prime candidate systems, we unveil the expected characteristics of this source class in the observables accessible at LAT energies. Combining the broadband emission model with the presently cataloged distribution of such systems and their individual parameters allows us to conclude on the expected maximum number of LAT-detections among massive stars in colliding wind binary systems.

  16. Massive gravitational waves in Chern-Simons modified gravity

    SciTech Connect (OSTI)

    Myung, Yun Soo; Moon, Taeyoon E-mail: tymoon@inje.ac.kr

    2014-10-01

    We consider the nondynamical Chern-Simons (nCS) modified gravity, which is regarded as a parity-odd theory of massive gravity in four dimensions. We first find polarization modes of gravitational waves for θ=x/μ in nCS modified gravity by using the Newman-Penrose formalism where the null complex tetrad is necessary to specify gravitational waves. We show that in the Newman–Penrose formalism, the number of polarization modes is one in addition to an unspecified Ψ{sub 4}, implying three degrees of freedom for θ=x/μ. This compares with two for a canonical embedding of θ=t/μ. Also, if one introduces the Ricci tensor formalism to describe a massive graviton arising from the nCS modified gravity, one finds one massive mode after making second-order wave equations, which is compared to five found from the parity-even Einstein–Weyl gravity.

  17. Parallel Integral Curves (Book) | SciTech Connect

    Office of Scientific and Technical Information (OSTI)

    SciTech Connect Search Results Book: Parallel Integral Curves Citation Details In-Document Search Title: Parallel Integral Curves Authors: Pugmire, Dave 1 ; Peterka, Tom 2 ; ...

  18. Parallel Algorithms and Patterns (Technical Report) | SciTech...

    Office of Scientific and Technical Information (OSTI)

    Parallel Algorithms and Patterns Citation Details In-Document Search Title: Parallel Algorithms and Patterns Authors: Robey, Robert W. 1 + Show Author Affiliations Los Alamos ...

  19. Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications...

    Office of Scientific and Technical Information (OSTI)

    Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications Citation Details In-Document Search Title: Linux Kernel Co-Scheduling For Bulk Synchronous Parallel ...

  20. Hanford Waste Treatment Plant Sets Massive Protective Shield door in

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    Pretreatment Facility | Department of Energy Waste Treatment Plant Sets Massive Protective Shield door in Pretreatment Facility Hanford Waste Treatment Plant Sets Massive Protective Shield door in Pretreatment Facility January 12, 2011 - 12:00pm Addthis The carbon steel doors come together to form an upside-down L-shape. The 102-ton door was set on top of the 85-ton door that was installed at the end of December. The carbon steel doors come together to form an upside-down L-shape. The

  1. Java Parallel Secure Stream for Grid Computing

    SciTech Connect (OSTI)

    Chen, Jie; Akers, Walter; Chen, Ying; Watson, William

    2001-09-01

    The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. This paper presents a pure Java package called JPARSS (Java Par-allel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addi-tion X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed.

  2. Parallel Harness for Informatic Stream Hashing

    Energy Science and Technology Software Center (OSTI)

    2012-09-11

    PHISH is a lightweight framework which a set of independent processes can use to exchange data as they run on the same desktop machine, on processors of a parallel machine, or on different machines across a network. This enables them to work in a coordinated parallel fashion to perform computations on either streaming, archived, or self-generated data. The PHISH distribution includes a simple, portable library for performing data exchanges in useful patterns either via MPImore » message-passing or ZMQ sockets. PHISH input scripts are used to describe a data-processing algorithm, and additional tools provided in the PHISH distribution convert the script into a form that can be launched as a parallel job.« less

  3. Xyce parallel electronic simulator release notes.

    SciTech Connect (OSTI)

    Keiter, Eric Richard; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.

  4. Parallel Implementation of Power System Dynamic Simulation

    SciTech Connect (OSTI)

    Jin, Shuangshuang; Huang, Zhenyu; Diao, Ruisheng; Wu, Di; Chen, Yousu

    2013-07-21

    Dynamic simulation of power system transient stability is important for planning, monitoring, operation, and control of electrical power systems. However, modeling the system dynamics and network involves the computationally intensive time-domain solution of numerous differential and algebraic equations (DAE). This results in a transient stability implementation that may not maintain the real-time constraints of an online security assessment. This paper presents a parallel implementation of the dynamic simulation on a high-performance computing (HPC) platform using parallel simulation algorithms and computation architectures. It enables the simulation to run even faster than real time, enabling the “look-ahead” capability of upcoming stability problems in the power grid.

  5. Berkeley Unified Parallel C (UPC) Compiler

    Energy Science and Technology Software Center (OSTI)

    2003-04-06

    This program is a portable, open-source, compiler for the UPC language, which is based on the Open64 framework, and has extensive support for optimizations. This compiler operated by translating UPC into ANS/ISO C for compilation by a native compiler and linking with a UPC Runtime Library. This design eases portability to both shared and distributed memory parallel architectures. For proper operation the "Berkeley Unified Parallel C (UPC) Runtime Library" and its dependencies are required. Compatiblemore » replacements which implement "The Berkeley UPC Runtime Specification" are possible.« less

  6. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  7. Perturbation Theory of Massive Yang-Mills Fields

    DOE R&D Accomplishments [OSTI]

    Veltman, M.

    1968-08-01

    Perturbation theory of massive Yang-Mills fields is investigated with the help of the Bell-Treiman transformation. Diagrams containing one closed loop are shown to be convergent if there are more than four external vector boson lines. The investigation presented does not exclude the possibility that the theory is renormalizable.

  8. Searches for massive neutrinos in nuclear beta decay

    SciTech Connect (OSTI)

    Jaros, J.A.

    1992-10-01

    The status of searches for massive neutrinos in nuclear beta decay is reviewed. The claim by an ITEP group that the electron antineutrino mass > 17eV has been disputed by all the subsequent experiments. Current measurements of the tritium beta spectrum limit m[sub [bar [nu

  9. Parallel Algebraic Multigrids for Structural mechanics

    SciTech Connect (OSTI)

    Brezina, M; Tong, C; Becker, R

    2004-05-11

    This paper presents the results of a comparison of three parallel algebraic multigrid (AMG) preconditioners for structural mechanics applications. In particular, they are interested in investigating both the scalability and robustness of the preconditioners. Numerical results are given for a range of structural mechanics problems with various degrees of difficulty.

  10. Parallel programming with PCN. Revision 2

    SciTech Connect (OSTI)

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  11. Linked-View Parallel Coordinate Plot Renderer

    Energy Science and Technology Software Center (OSTI)

    2011-06-28

    This software allows multiple linked views for interactive querying via map-based data selection, bar chart analytic overlays, and high dynamic range (HDR) line renderings. The major component of the visualization package is a parallel coordinate renderer with binning, curved layouts, shader-based rendering, and other techniques to allow interactive visualization of multidimensional data.

  12. The parallel virtual file system for portals.

    SciTech Connect (OSTI)

    Schutt, James Alan

    2004-04-01

    This report presents the result of an effort to re-implement the Parallel Virtual File System (PVFS) using Portals as the transport. This report provides short overviews of PVFS and Portals, and describes the design and implementation of PVFS over Portals. Finally, the results of performance testing of both stock PVFS and PVFS over Portals are presented.

  13. Message passing with parallel queue traversal

    DOE Patents [OSTI]

    Underwood, Keith D.; Brightwell, Ronald B.; Hemmert, K. Scott

    2012-05-01

    In message passing implementations, associative matching structures are used to permit list entries to be searched in parallel fashion, thereby avoiding the delay of linear list traversal. List management capabilities are provided to support list entry turnover semantics and priority ordering semantics.

  14. Communication Graph Generator for Parallel Programs

    Energy Science and Technology Software Center (OSTI)

    2014-04-08

    Graphator is a collection of relatively simple sequential programs that generate communication graphs/matrices for commonly occurring patterns in parallel programs. Currently, there is support for five communication patterns: two-dimensional 4-point stencil, four-dimensional 8-point stencil, all-to-alls over sub-communicators, random near-neighbor communication, and near-neighbor communication.

  15. Parallel stitching of 2D materials

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Ling, Xi; Wu, Lijun; Lin, Yuxuan; Ma, Qiong; Wang, Ziqiang; Song, Yi; Yu, Lili; Huang, Shengxi; Fang, Wenjing; Zhang, Xu; et al

    2016-01-27

    Diverse parallel stitched 2D heterostructures, including metal–semiconductor, semiconductor–semiconductor, and insulator–semiconductor, are synthesized directly through selective “sowing” of aromatic molecules as the seeds in the chemical vapor deposition (CVD) method. Lastly, the methodology enables the large-scale fabrication of lateral heterostructures, which offers tremendous potential for its application in integrated circuits.

  16. Parallel Performance of a Combustion Chemistry Simulation

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Skinner, Gregg; Eigenmann, Rudolf

    1995-01-01

    We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.

  17. Collectively loading an application in a parallel computer

    DOE Patents [OSTI]

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Miller, Samuel J.; Mundy, Michael B.

    2016-01-05

    Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.

  18. Multitasking TORT under UNICOS: Parallel performance models and measurements

    SciTech Connect (OSTI)

    Barnett, A.; Azmy, Y.Y.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  19. Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing

    SciTech Connect (OSTI)

    Widlund, Olof B.

    2015-06-09

    The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independent of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.

  20. FETI Prime Domain Decomposition base Parallel Iterative Solver Library Ver.1.0

    Energy Science and Technology Software Center (OSTI)

    2003-09-15

    FETI Prime is a library for the iterative solution of linear equations in solid and structural mechanics. The algorithm employs preconditioned conjugate gradients, with a domain decomposition-based preconditioner. The software is written in C++ and is designed for use with massively parallel computers, using MPI. The algorithm is based on the FETI-DP method, with additional capabilities for handling constraint equations, as well as interfacing with the Salinas structural dynamics code and the Finite Element Interfacemore » (FEI) library. Practical Application: FETI Prime is designed for use with finite element-based simulation codes for solid and structural mechanics. The solver uses element matrices, connectivity information, nodal information, and force vectors computed by the host code and provides back the solution to the linear system of equations, to the user specified level of accuracy, The library is compiled with the host code and becomes an integral part of the host code executable.« less

  1. A Pervasive Parallel Processing Framework For Data Visualization And Analysis At Extreme Scale Final Scientific and Technical Report

    SciTech Connect (OSTI)

    Geveci, Berk

    2014-10-31

    The evolution of the computing world from teraflop to petaflop has been relatively effortless,with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipeline model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.

  2. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Data communications in a parallel active messaging interface ('PAMI') or a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution of a compute node, including specification of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications instruction, the instruction characterized by instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance witht the instruction type, the transfer data from the origin endpoin to the target endpoint.

  3. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-29

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.

  4. Translation invariant time-dependent solutions to massive gravity II

    SciTech Connect (OSTI)

    Mourad, J.; Steer, D.A. E-mail: steer@apc.univ-paris7.fr

    2014-06-01

    This paper is a sequel to JCAP 12 (2013) 004 and is also devoted to translation-invariant solutions of ghost-free massive gravity in its moving frame formulation. Here we consider a mass term which is linear in the vielbein (corresponding to a ?{sub 3} term in the 4D metric formulation) in addition to the cosmological constant. We determine explicitly the constraints, and from the initial value formulation show that the time-dependent solutions can have singularities at a finite time. Although the constraints give, as in the ?{sub 1} case, the correct number of degrees of freedom for a massive spin two field, we show that the lapse function can change sign at a finite time causing a singular time evolution. This is very different to the ?{sub 1} case where time evolution is always well defined. We conclude that the ?{sub 3} mass term can be pathological and should be treated with care.

  5. Searches for massive neutrinos in nuclear beta decay

    SciTech Connect (OSTI)

    Jaros, J.A.

    1992-10-01

    The status of searches for massive neutrinos in nuclear beta decay is reviewed. The claim by an ITEP group that the electron antineutrino mass > 17eV has been disputed by all the subsequent experiments. Current measurements of the tritium beta spectrum limit m{sub {bar {nu}}e} < 10 eV. The status of the 17 keV neutrino is reviewed. The strong null results from INS Tokyo and Argonne, and deficiencies in the experiments which reported positive effects, make it unreasonable to ascribe the spectral distortions seen by Simpson, Hime, and others to a 17keV neutrino. Several new ideas on how to search for massive neutrinos in nuclear beta decay are discussed.

  6. INTERNAL GRAVITY WAVES IN MASSIVE STARS: ANGULAR MOMENTUM TRANSPORT

    SciTech Connect (OSTI)

    Rogers, T. M.; Lin, D. N. C.; McElwaine, J. N.; Lau, H. H. B. E-mail: lin@ucolick.org E-mail: hblau@astro.uni-bonn.de

    2013-07-20

    We present numerical simulations of internal gravity waves (IGW) in a star with a convective core and extended radiative envelope. We report on amplitudes, spectra, dissipation, and consequent angular momentum transport by such waves. We find that these waves are generated efficiently and transport angular momentum on short timescales over large distances. We show that, as in Earth's atmosphere, IGW drive equatorial flows which change magnitude and direction on short timescales. These results have profound consequences for the observational inferences of massive stars, as well as their long term angular momentum evolution. We suggest IGW angular momentum transport may explain many observational mysteries, such as: the misalignment of hot Jupiters around hot stars, the Be class of stars, Ni enrichment anomalies in massive stars, and the non-synchronous orbits of interacting binaries.

  7. The halo model in a massive neutrino cosmology

    SciTech Connect (OSTI)

    Massara, Elena; Villaescusa-Navarro, Francisco; Viel, Matteo E-mail: villaescusa@oats.inaf.it

    2014-12-01

    We provide a quantitative analysis of the halo model in the context of massive neutrino cosmologies. We discuss all the ingredients necessary to model the non-linear matter and cold dark matter power spectra and compare with the results of N-body simulations that incorporate massive neutrinos. Our neutrino halo model is able to capture the non-linear behavior of matter clustering with a ?20% accuracy up to very non-linear scales of k = 10 h/Mpc (which would be affected by baryon physics). The largest discrepancies arise in the range k = 0.5 – 1 h/Mpc where the 1-halo and 2-halo terms are comparable and are present also in a massless neutrino cosmology. However, at scales k < 0.2 h/Mpc our neutrino halo model agrees with the results of N-body simulations at the level of 8% for total neutrino masses of < 0.3 eV. We also model the neutrino non-linear density field as a sum of a linear and clustered component and predict the neutrino power spectrum and the cold dark matter-neutrino cross-power spectrum up to k = 1 h/Mpc with ?30% accuracy. For masses below 0.15 eV the neutrino halo model captures the neutrino induced suppression, casted in terms of matter power ratios between massive and massless scenarios, with a 2% agreement with the results of N-body/neutrino simulations. Finally, we provide a simple application of the halo model: the computation of the clustering of galaxies, in massless and massive neutrinos cosmologies, using a simple Halo Occupation Distribution scheme and our halo model extension.

  8. THE ROLE OF THE MAGNETOROTATIONAL INSTABILITY IN MASSIVE STARS

    SciTech Connect (OSTI)

    Wheeler, J. Craig; Kagan, Daniel; Chatzopoulos, Emmanouil

    2015-01-20

    The magnetorotational instability (MRI) is key to physics in accretion disks and is widely considered to play some role in massive star core collapse. Models of rotating massive stars naturally develop very strong shear at composition boundaries, a necessary condition for MRI instability, and the MRI is subject to triply diffusive destabilizing effects in radiative regions. We have used the MESA stellar evolution code to compute magnetic effects due to the Spruit-Tayler (ST) mechanism and the MRI, separately and together, in a sample of massive star models. We find that the MRI can be active in the later stages of massive star evolution, leading to mixing effects that are not captured in models that neglect the MRI. The MRI and related magnetorotational effects can move models of given zero-age main sequence mass across ''boundaries'' from degenerate CO cores to degenerate O/Ne/Mg cores and from degenerate O/Ne/Mg cores to iron cores, thus affecting the final evolution and the physics of core collapse. The MRI acting alone can slow the rotation of the inner core in general agreement with the observed ''initial'' rotation rates of pulsars. The MRI analysis suggests that localized fields ?10{sup 12} G may exist at the boundary of the iron core. With both the ST and MRI mechanisms active in the 20 M {sub ?} model, we find that the helium shell mixes entirely out into the envelope. Enhanced mixing could yield a population of yellow or even blue supergiant supernova progenitors that would not be standard SN IIP.

  9. A symmetric approach to the massive nonlinear sigma model

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Ferrari, Ruggero

    2011-09-28

    In the present study we extend to the massive case the procedure of divergences subtraction, previously introduced for the massless nonlinear sigma model (D = 4). Perturbative expansion in the number of loops is successfully constructed. The resulting theory depends on the Spontaneous Symmetry Breaking parameter v, on the mass m and on the radiative correction parameter Λ. Fermions are not considered in the present work. SU(2) Ⓧ SU(2) is the group used.

  10. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    SciTech Connect (OSTI)

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  11. Locating hardware faults in a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

    2010-04-13

    Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.

  12. Parallel machine architecture for production rule systems

    DOE Patents [OSTI]

    Allen, Jr., John D.; Butler, Philip L.

    1989-01-01

    A parallel processing system for production rule programs utilizes a host processor for storing production rule right hand sides (RHS) and a plurality of rule processors for storing left hand sides (LHS). The rule processors operate in parallel in the recognize phase of the system recognize -Act Cycle to match their respective LHS's against a stored list of working memory elements (WME) in order to find a self consistent set of WME's. The list of WME is dynamically varied during the Act phase of the system in which the host executes or fires rule RHS's for those rules for which a self-consistent set has been found by the rule processors. The host transmits instructions for creating or deleting working memory elements as dictated by the rule firings until the rule processors are unable to find any further self-consistent working memory element sets at which time the production rule system is halted.

  13. The evolutionary tracks of young massive star clusters

    SciTech Connect (OSTI)

    Pfalzner, S.; Steinhausen, M.; Vincke, K.; Menten, K.; Parmentier, G.

    2014-10-20

    Stars mostly form in groups consisting of a few dozen to several ten thousand members. For 30 years, theoretical models have provided a basic concept of how such star clusters form and develop: they originate from the gas and dust of collapsing molecular clouds. The conversion from gas to stars being incomplete, the leftover gas is expelled, leading to cluster expansion and stars becoming unbound. Observationally, a direct confirmation of this process has proved elusive, which is attributed to the diversity of the properties of forming clusters. Here we take into account that the true cluster masses and sizes are masked, initially by the surface density of the background and later by the still present unbound stars. Based on the recent observational finding that in a given star-forming region the star formation efficiency depends on the local density of the gas, we use an analytical approach combined with N-body simulations to reveal evolutionary tracks for young massive clusters covering the first 10 Myr. Just like the Hertzsprung-Russell diagram is a measure for the evolution of stars, these tracks provide equivalent information for clusters. Like stars, massive clusters form and develop faster than their lower-mass counterparts, explaining why so few massive cluster progenitors are found.

  14. Three-dimensional Casimir piston for massive scalar fields

    SciTech Connect (OSTI)

    Lim, S.C. Teo, L.P.

    2009-08-15

    We consider Casimir force acting on a three-dimensional rectangular piston due to a massive scalar field subject to periodic, Dirichlet and Neumann boundary conditions. Exponential cut-off method is used to derive the Casimir energy. It is shown that the divergent terms do not contribute to the Casimir force acting on the piston, thus render a finite well-defined Casimir force acting on the piston. Explicit expressions for the total Casimir force acting on the piston is derived, which show that the Casimir force is always attractive for all the different boundary conditions considered. As a function of a - the distance from the piston to the opposite wall, it is found that the magnitude of the Casimir force behaves like 1/a{sup 4} when a{yields}0{sup +} and decays exponentially when a{yields}{infinity}. Moreover, the magnitude of the Casimir force is always a decreasing function of a. On the other hand, passing from massless to massive, we find that the effect of the mass is insignificant when a is small, but the magnitude of the force is decreased for large a in the massive case.

  15. Runtime System Library for Parallel Weather Modules

    Energy Science and Technology Software Center (OSTI)

    1997-07-22

    RSL is a Fortran-callable runtime library for use in implementing regular-grid weather forecast models, with nesting, on scalable distributed memory parallel computers. It provides high-level routines for finite-difference stencil communications and inter-domain exchange of data for nested forcing and feedback. RSL supports a unique point-wise domain-decomposition strategy to facilitate load-balancing.

  16. FORTRAN Extensions for Modular Parallel Processing

    Energy Science and Technology Software Center (OSTI)

    1996-01-12

    FORTRAN M is a small set of extensions to FORTRAN that supports a modular approach to the construction of sequential and parallel programs. FORTRAN M programs use channels to plug together processes which may be written in FORTRAN M or FORTRAN 77. Processes communicate by sending and receiving messages on channels. Channels and processes can be created dynamically, but programs remain deterministic unless specialized nondeterministic constructs are used.

  17. Parallel Integrated Thermal Management - Energy Innovation Portal

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Vehicles and Fuels Vehicles and Fuels Early Stage R&D Early Stage R&D Find More Like This Return to Search Parallel Integrated Thermal Management National Renewable Energy Laboratory Contact NREL About This Technology Technology Marketing Summary Many current cooling systems for hybrid electric vehicles (HEVs) with a high power electric drive system utilize a low temperature liquid cooling loop for cooling the power electronics system and electric machines associated with the electric

  18. Parallel Heuristics for Scalable Community Detection

    SciTech Connect (OSTI)

    Lu, Howard; Kalyanaraman, Anantharaman; Halappanavar, Mahantesh; Choudhury, Sutanay

    2014-05-17

    Community detection has become a fundamental operation in numerous graph-theoretic applications. It is used to reveal natural divisions that exist within real world networks without imposing prior size or cardinality constraints on the set of communities. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed by Blondel et al. in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method is also inherently sequential, thereby limiting its scalability to problems that can be solved on desktops. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose multiple heuristics that are designed to break the sequential barrier. Our heuristics are agnostic to the underlying parallel architecture. For evaluation purposes, we implemented our heuristics on shared memory (OpenMP) and distributed memory (MapReduce-MPI) machines, and tested them over real world graphs derived from multiple application domains (internet, biological, natural language processing). Experimental results demonstrate the ability of our heuristics to converge to high modularity solutions comparable to those output by the serial algorithm in nearly the same number of iterations, while also drastically reducing time to solution.

  19. Parallel Molecular Dynamics Program for Molecules

    Energy Science and Technology Software Center (OSTI)

    1995-03-07

    ParBond is a parallel classical molecular dynamics code that models bonded molecular systems, typically of an organic nature. It uses classical force fields for both non-bonded Coulombic and Van der Waals interactions and for 2-, 3-, and 4-body bonded (bond, angle, dihedral, and improper) interactions. It integrates Newton''s equation of motion for the molecular system and evaluates various thermodynamical properties of the system as it progresses.

  20. Xyce parallel electronic simulator : reference guide.

    SciTech Connect (OSTI)

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.

  1. Xyce(™) Parallel Electronic Simulator

    Energy Science and Technology Software Center (OSTI)

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuitmore » network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.« less

  2. Characterizing the parallelism in rule-based expert systems

    SciTech Connect (OSTI)

    Douglass, R.J.

    1984-01-01

    A brief review of two classes of rule-based expert systems is presented, followed by a detailed analysis of potential sources of parallelism at the production or rule level, the subrule level (including match, select, and act parallelism), and at the search level (including AND, OR, and stream parallelism). The potential amount of parallelism from each source is discussed and characterized in terms of its granularity, inherent serial constraints, efficiency, speedup, dynamic behavior, and communication volume, frequency, and topology. Subrule parallelism will yield, at best, two- to tenfold speedup, and rule level parallelism will yield a modest speedup on the order of 5 to 10 times. Rule level can be combined with OR, AND, and stream parallelism in many instances to yield further parallel speedups.

  3. SimFS: A Large Scale Parallel File System Simulator

    Energy Science and Technology Software Center (OSTI)

    2011-08-30

    The software provides both framework and tools to simulate a large-scale parallel file system such as Lustre.

  4. Parallelizing AT with MatlabMPI

    SciTech Connect (OSTI)

    Li, Evan Y.; /Brown U. /SLAC

    2011-06-22

    The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while

  5. A brief parallel I/O tutorial.

    SciTech Connect (OSTI)

    Ward, H. Lee

    2010-03-01

    This document provides common best practices for the efficient utilization of parallel file systems for analysts and application developers. A multi-program, parallel supercomputer is able to provide effective compute power by aggregating a host of lower-power processors using a network. The idea, in general, is that one either constructs the application to distribute parts to the different nodes and processors available and then collects the result (a parallel application), or one launches a large number of small jobs, each doing similar work on different subsets (a campaign). The I/O system on these machines is usually implemented as a tightly-coupled, parallel application itself. It is providing the concept of a 'file' to the host applications. The 'file' is an addressable store of bytes and that address space is global in nature. In essence, it is providing a global address space. Beyond the simple reality that the I/O system is normally composed of a small, less capable, collection of hardware, that concept of a global address space will cause problems if not very carefully utilized. How much of a problem and the ways in which those problems manifest will be different, but that it is problem prone has been well established. Worse, the file system is a shared resource on the machine - a system service. What an application does when it uses the file system impacts all users. It is not the case that some portion of the available resource is reserved. Instead, the I/O system responds to requests by scheduling and queuing based on instantaneous demand. Using the system well contributes to the overall throughput on the machine. From a solely self-centered perspective, using it well reduces the time that the application or campaign is subject to impact by others. The developer's goal should be to accomplish I/O in a way that minimizes interaction with the I/O system, maximizes the amount of data moved per call, and provides the I/O system the most information about

  6. Parallel State Estimation Assessment with Practical Data

    SciTech Connect (OSTI)

    Chen, Yousu; Jin, Shuangshuang; Rice, Mark J.; Huang, Zhenyu

    2014-10-31

    This paper presents a full-cycle parallel state estimation (PSE) implementation using a preconditioned conjugate gradient algorithm. The developed code is able to solve large-size power system state estimation within 5 seconds using real-world data, comparable to the Supervisory Control And Data Acquisition (SCADA) rate. This achievement allows the operators to know the system status much faster to help improve grid reliability. Case study results of the Bonneville Power Administration (BPA) system with real measurements are presented. The benefits of fast state estimation are also discussed.

  7. SPRNG Scalable Parallel Random Number Generator LIbrary

    Energy Science and Technology Software Center (OSTI)

    2010-03-16

    This revision corrects some errors in SPRNG 1. Users of newer SPRNG versions can obtain the corrected files and build their version with it. This version also improves the scalability of some of the application-based tests in the SPRNG test suite. It also includes an interface to a parallel Mersenne Twister, so that if users install the Mersenne Twister, then they can test this generator with the SPRNG test suite and also use some SPRNGmore » features with that generator.« less

  8. Parallel heater system for subsurface formations

    DOE Patents [OSTI]

    Harris, Christopher Kelvin (Houston, TX); Karanikas, John Michael (Houston, TX); Nguyen, Scott Vinh (Houston, TX)

    2011-10-25

    A heating system for a subsurface formation is disclosed. The system includes a plurality of substantially horizontally oriented or inclined heater sections located in a hydrocarbon containing layer in the formation. At least a portion of two of the heater sections are substantially parallel to each other. The ends of at least two of the heater sections in the layer are electrically coupled to a substantially horizontal, or inclined, electrical conductor oriented substantially perpendicular to the ends of the at least two heater sections.

  9. Requirements for Parallel I/O,

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Requirements for Parallel I/O, ! Visualization and Analysis Prabhat 1 , Q uincey K oziol 2 1 LBL/NERSC 2 The H DF G roup NERSC A SCR R equirements f or 2 017 January 1 5, 2 014 LBNL 1. Project Description! * m636 r epo * LBL V is B ase P rogram ( Bethel P I) [ PM: N owell] * Conduct f undamental a nd a pplied vis/analyEcs R &D t o address e xascale c hallenges * ExaHDF5 P roject ( Prabhat, Q uincey P Is) [ PM: Nowell] * Scale P arallel I /O, a nd d ata m anagement t echnologies f or current

  10. Carbothermic reduction with parallel heat sources

    DOE Patents [OSTI]

    Troup, Robert L.; Stevenson, David T.

    1984-12-04

    Disclosed are apparatus and method of carbothermic direct reduction for producing an aluminum alloy from a raw material mix including aluminum oxide, silicon oxide, and carbon wherein parallel heat sources are provided by a combustion heat source and by an electrical heat source at essentially the same position in the reactor, e.g., such as at the same horizontal level in the path of a gravity-fed moving bed in a vertical reactor. The present invention includes providing at least 79% of the heat energy required in the process by the electrical heat source.