National Library of Energy BETA

Sample records for massively parallel microcell-based

  1. Massively Parallel Models of the Human Circulatory System (Conference...

    Office of Scientific and Technical Information (OSTI)

    Massively Parallel Models of the Human Circulatory System Citation Details In-Document Search Title: Massively Parallel Models of the Human Circulatory System You are accessing ...

  2. Massively Parallel Models of the Human Circulatory System (Conference...

    Office of Scientific and Technical Information (OSTI)

    Massively Parallel Models of the Human Circulatory System Citation Details In-Document Search Title: Massively Parallel Models of the Human Circulatory System Authors: Randles, A ; ...

  3. Impact analysis on a massively parallel computer

    SciTech Connect (OSTI)

    Zacharia, T.; Aramayo, G.A.

    1994-06-01

    Advanced mathematical techniques and computer simulation play a major role in evaluating and enhancing the design of beverage cans, industrial, and transportation containers for improved performance. Numerical models are used to evaluate the impact requirements of containers used by the Department of Energy (DOE) for transporting radioactive materials. Many of these models are highly compute-intensive. An analysis may require several hours of computational time on current supercomputers despite the simplicity of the models being studied. As computer simulations and materials databases grow in complexity, massively parallel computers have become important tools. Massively parallel computational research at the Oak Ridge National Laboratory (ORNL) and its application to the impact analysis of shipping containers is briefly described in this paper.

  4. Template based parallel checkpointing in a massively parallel computer system

    DOE Patents [OSTI]

    Archer, Charles Jens (Rochester, MN); Inglett, Todd Alan (Rochester, MN)

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  5. A Massively Parallel Solver for the Mechanical Harmonic Analysis...

    Office of Scientific and Technical Information (OSTI)

    Details In-Document Search Title: A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities ACE3P is a 3D massively parallel simulation suite that...

  6. MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

    SciTech Connect (OSTI)

    Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

    2013-05-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.

  7. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and...

    Office of Scientific and Technical Information (OSTI)

    Technical Report: PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes Citation Details In-Document Search...

  8. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and...

    Office of Scientific and Technical Information (OSTI)

    PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes Lichtner, Peter OFM Research; Karra, Satish Los...

  9. Massively parallel mesh generation for physics codes

    SciTech Connect (OSTI)

    Hardin, D.D.

    1996-06-01

    Massively parallel processors (MPPs) will soon enable realistic 3-D physical modeling of complex objects and systems. Work is planned or presently underway to port many of LLNL`s physical modeling codes to MPPs. LLNL`s DSI3D electromagnetics code already can solve 40+ million zone problems on the 256 processor Meiko. However, the author lacks the software necessary to generate and manipulate the large meshes needed to model many complicated 3-D geometries. State-of-the-art commercial mesh generators run on workstations and have a practical limit of several hundred thousand elements. In the foreseeable future MPPs will solve problems with a billion mesh elements. The objective of the Parallel Mesh Generation (PMESH) Project is to develop a unique mesh generation system that can construct large 3-D meshes (up to a billion elements) on MPPs. Such a capability will remove a critical roadblock to unleashing the power of MPPs for physical analysis and will put LLNL at the forefront of mesh generation technology. PMESH will ``front-end`` a variety of LLNL 3-D physics codes, including those in the areas of electromagnetics, structural mechanics, thermal analysis, and hydrodynamics. The DSI3D and DYNA3D codes are already running on MPPs. The primary goal of the PMESH project is to provide the robust generation of large meshes for complicated 3-D geometries through the appropriate distribution of the generation task between the user`s workstation and the MPP. Secondary goals are to support the unique features of LLNL physics codes (e.g., unusual elements) and to minimize the user effort required to generate different meshes for the same geometry. PMESH`s capabilities are essential because mesh generation is presently a major limiting factor in simulating larger and more complex 3-D geometries. PMESH will significantly enhance LLNL`s capabilities in physical simulation by advancing the state-of-the-art in large mesh generation by 2 to 3 orders of magnitude.

  10. Discontinuous Methods for Accurate, Massively Parallel Quantum Molecular

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Dynamics John Pask is Lead Prinicipal Investigator for Discontinuous Methods for Accurate, Massively Parallel Quantum Molecular Dynamics. Discontinuous Methods for Accurate, Massively Parallel Quantum Molecular Dynamics Research We develop and apply a recent breakthrough, the Discontinuous Galerkin electronic structure method, to reach for the first time the required length and time scales to attain a detailed quantum mechanical understanding of the chemistry and dynamics at the SEI layer in

  11. Efficient parallel global garbage collection on massively parallel computers

    SciTech Connect (OSTI)

    Kamada, Tomio; Matsuoka, Satoshi; Yonezawa, Akinori

    1994-12-31

    On distributed-memory high-performance MPPs where processors are interconnected by an asynchronous network, efficient Garbage Collection (GC) becomes difficult due to inter-node references and references within pending, unprocessed messages. The parallel global GC algorithm (1) takes advantage of reference locality, (2) efficiently traverses references over nodes, (3) admits minimum pause time of ongoing computations, and (4) has been shown to scale up to 1024 node MPPs. The algorithm employs a global weight counting scheme to substantially reduce message traffic. The two methods for confirming the arrival of pending messages are used: one counts numbers of messages and the other uses network `bulldozing.` Performance evaluation in actual implementations on a multicomputer with 32-1024 nodes, Fujitsu AP1000, reveals various favorable properties of the algorithm.

  12. Routing performance analysis and optimization within a massively parallel computer

    DOE Patents [OSTI]

    Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen

    2013-04-16

    An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.

  13. A massively parallel fractional step solver for incompressible flows

    SciTech Connect (OSTI)

    Houzeaux, G. Vazquez, M. Aubry, R. Cela, J.M.

    2009-09-20

    This paper presents a parallel implementation of fractional solvers for the incompressible Navier-Stokes equations using an algebraic approach. Under this framework, predictor-corrector and incremental projection schemes are seen as sub-classes of the same class, making apparent its differences and similarities. An additional advantage of this approach is to set a common basis for a parallelization strategy, which can be extended to other split techniques or to compressible flows. The predictor-corrector scheme consists in solving the momentum equation and a modified 'continuity' equation (namely a simple iteration for the pressure Schur complement) consecutively in order to converge to the monolithic solution, thus avoiding fractional errors. On the other hand, the incremental projection scheme solves only one iteration of the predictor-corrector per time step and adds a correction equation to fulfill the mass conservation. As shown in the paper, these two schemes are very well suited for massively parallel implementation. In fact, when compared with monolithic schemes, simpler solvers and preconditioners can be used to solve the non-symmetric momentum equations (GMRES, Bi-CGSTAB) and to solve the symmetric continuity equation (CG, Deflated CG). This gives good speedup properties of the algorithm. The implementation of the mesh partitioning technique is presented, as well as the parallel performances and speedups for thousands of processors.

  14. Comparing current cluster, massively parallel, and accelerated systems

    SciTech Connect (OSTI)

    Barker, Kevin J; Davis, Kei; Hoisie, Adolfy; Kerbyson, Darren J; Pakin, Scott; Lang, Mike; Sancho Pitarch, Jose C

    2010-01-01

    Currently there is large architectural diversity in high perfonnance computing systems. They include 'commodity' cluster systems that optimize per-node performance for small jobs, massively parallel processors (MPPs) that optimize aggregate perfonnance for large jobs, and accelerated systems that optimize both per-node and aggregate performance but only for applications custom-designed to take advantage of such systems. Because of these dissimilarities, meaningful comparisons of achievable performance are not straightforward. In this work we utilize a methodology that combines both empirical analysis and performance modeling to compare clusters (represented by a 4,352-core IB cluster), MPPs (represented by a 147,456-core BG/P), and accelerated systems (represented by the 129,600-core Roadrunner) across a workload of four applications. Strengths of our approach include the ability to compare architectures - as opposed to specific implementations of an architecture - attribute each application's performance bottlenecks to characteristics unique to each system, and to explore performance scenarios in advance of their availability for measurement. Our analysis illustrates that application performance is essentially unrelated to relative peak performance but that application performance can be both predicted and explained using modeling.

  15. Massively parallel processor networks with optical express channels

    DOE Patents [OSTI]

    Deri, R.J.; Brooks, E.D. III; Haigh, R.E.; DeGroot, A.J.

    1999-08-24

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination. 3 figs.

  16. Massively parallel processor networks with optical express channels

    DOE Patents [OSTI]

    Deri, Robert J. (Pleasanton, CA); Brooks, III, Eugene D. (Livermore, CA); Haigh, Ronald E. (Tracy, CA); DeGroot, Anthony J. (Castro Valley, CA)

    1999-01-01

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination.

  17. SWAMP+: multiple subsequence alignment using associative massive parallelism

    SciTech Connect (OSTI)

    Steinfadt, Shannon Irene [Los Alamos National Laboratory; Baker, Johnnie W [KENT STATE UNIV.

    2010-10-18

    A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.

  18. BlueGene/L Applications: Parallelism on a Massive Scale (Journal Article) |

    Office of Scientific and Technical Information (OSTI)

    SciTech Connect Journal Article: BlueGene/L Applications: Parallelism on a Massive Scale Citation Details In-Document Search Title: BlueGene/L Applications: Parallelism on a Massive Scale BlueGene/L (BG/L), developed through a partnership between IBM and Lawrence Livermore National Laboratory (LLNL), is currently the world's largest system both in terms of scale with 131,072 processors and absolute performance with a peak rate of 367 TFlop/s. BG/L has led the Top500 list the last four times

  19. Analysis of gallium arsenide deposition in a horizontal chemical vapor deposition reactor using massively parallel computations

    SciTech Connect (OSTI)

    Salinger, A.G.; Shadid, J.N.; Hutchinson, S.A.

    1998-01-01

    A numerical analysis of the deposition of gallium from trimethylgallium (TMG) and arsine in a horizontal CVD reactor with tilted susceptor and a three inch diameter rotating substrate is performed. The three-dimensional model includes complete coupling between fluid mechanics, heat transfer, and species transport, and is solved using an unstructured finite element discretization on a massively parallel computer. The effects of three operating parameters (the disk rotation rate, inlet TMG fraction, and inlet velocity) and two design parameters (the tilt angle of the reactor base and the reactor width) on the growth rate and uniformity are presented. The nonlinear dependence of the growth rate uniformity on the key operating parameters is discussed in detail. Efficient and robust algorithms for massively parallel reacting flow simulations, as incorporated into our analysis code MPSalsa, make detailed analysis of this complicated system feasible.

  20. Massively parallel Monte Carlo for many-particle simulations on GPUs

    SciTech Connect (OSTI)

    Anderson, Joshua A.; Jankowski, Eric [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)] [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Grubb, Thomas L. [Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)] [Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Engel, Michael [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)] [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Glotzer, Sharon C., E-mail: sglotzer@umich.edu [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)

    2013-12-01

    Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.

  1. Molecular Dynamics Simulations from SNL's Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)

    DOE Data Explorer [Office of Scientific and Technical Information (OSTI)]

    Plimpton, Steve; Thompson, Aidan; Crozier, Paul

    LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.

  2. CASL-U-2015-0170-000 SHIFT: A Massively Parallel Monte Carlo

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    SHIFT: A Massively Parallel Monte Carlo Radiation Transport Package Tara M. Pandya, Seth R. Johnson, Gregory G. Davidson, Thomas M. Evans, and Steven P. Hamilton Oak Ridge National Laboratory April 19, 2015 CASL-U-2015-0170-000 ANS MC2015 - Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method * Nashville, Tennessee * April 19-23, 2015, on CD-ROM, American Nuclear Society, LaGrange Park, IL (2015)

  3. A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities

    SciTech Connect (OSTI)

    O. Kononenko

    2015-02-17

    ACE3P is a 3D massively parallel simulation suite that developed at SLAC National Accelerator Laboratory that can perform coupled electromagnetic, thermal and mechanical study. Effectively utilizing supercomputer resources, ACE3P has become a key simulation tool for particle accelerator R and D. A new frequency domain solver to perform mechanical harmonic response analysis of accelerator components is developed within the existing parallel framework. This solver is designed to determine the frequency response of the mechanical system to external harmonic excitations for time-efficient accurate analysis of the large-scale problems. Coupled with the ACE3P electromagnetic modules, this capability complements a set of multi-physics tools for a comprehensive study of microphonics in superconducting accelerating cavities in order to understand the RF response and feedback requirements for the operational reliability of a particle accelerator. (auth)

  4. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.« less

  5. Capabilities, Implementation, and Benchmarking of Shift, a Massively Parallel Monte Carlo Radiation Transport Code

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Pandya, Tara M; Johnson, Seth R; Evans, Thomas M; Davidson, Gregory G; Hamilton, Steven P; Godfrey, Andrew T

    2016-01-01

    This work discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemorespecific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 R problems. These benchmark and scaling studies show promising results.less

  6. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; Anderson, Steve; Woodward, Paul; Dietz, Hank

    1995-01-01

    Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less

  7. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

    SciTech Connect (OSTI)

    Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel

    2009-02-15

    We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.

  8. 3-D readout-electronics packaging for high-bandwidth massively paralleled imager

    DOE Patents [OSTI]

    Kwiatkowski, Kris (Los Alamos, NM); Lyke, James (Albuquerque, NM)

    2007-12-18

    Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.

  9. A Massively Parallel Sparse Eigensolver for Structural Dynamics Finite Element Analysis

    SciTech Connect (OSTI)

    Day, David M.; Reese, G.M.

    1999-05-01

    Eigenanalysis is a critical component of structural dynamics which is essential for determinating the vibrational response of systems. This effort addresses the development of numerical algorithms associated with scalable eigensolver techniques suitable for use on massively parallel, distributed memory computers that are capable of solving large scale structural dynamics problems. An iterative Lanczos method was determined to be the best choice for the application. Scalability of the eigenproblem depends on scalability of the underlying linear solver. A multi-level solver (FETI) was selected as most promising for this component. Issues relating to heterogeneous materials, mechanisms and multipoint constraints have been examined, and the linear solver algorithm has been developed to incorporate features that result in a scalable, robust algorithm for practical structural dynamics applications. The resulting tools have been demonstrated on large problems representative of a weapon's system.

  10. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

    2015-04-08

    The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less

  11. Massively parallel computation of 3D flow and reactions in chemical vapor deposition reactors

    SciTech Connect (OSTI)

    Salinger, A.G.; Shadid, J.N.; Hutchinson, S.A.; Hennigan, G.L.; Devine, K.D.; Moffat, H.K.

    1997-12-01

    Computer modeling of Chemical Vapor Deposition (CVD) reactors can greatly aid in the understanding, design, and optimization of these complex systems. Modeling is particularly attractive in these systems since the costs of experimentally evaluating many design alternatives can be prohibitively expensive, time consuming, and even dangerous, when working with toxic chemicals like Arsine (AsH{sub 3}): until now, predictive modeling has not been possible for most systems since the behavior is three-dimensional and governed by complex reaction mechanisms. In addition, CVD reactors often exhibit large thermal gradients, large changes in physical properties over regions of the domain, and significant thermal diffusion for gas mixtures with widely varying molecular weights. As a result, significant simplifications in the models have been made which erode the accuracy of the models` predictions. In this paper, the authors will demonstrate how the vast computational resources of massively parallel computers can be exploited to make possible the analysis of models that include coupled fluid flow and detailed chemistry in three-dimensional domains. For the most part, models have either simplified the reaction mechanisms and concentrated on the fluid flow, or have simplified the fluid flow and concentrated on rigorous reactions. An important CVD research thrust has been in detailed modeling of fluid flow and heat transfer in the reactor vessel, treating transport and reaction of chemical species either very simply or as a totally decoupled problem. Using the analogy between heat transfer and mass transfer, and the fact that deposition is often diffusion limited, much can be learned from these calculations; however, the effects of thermal diffusion, the change in physical properties with composition, and the incorporation of surface reaction mechanisms are not included in this model, nor can transitions to three-dimensional flows be detected.

  12. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOE Patents [OSTI]

    Karasick, M.S.; Strip, D.R.

    1996-01-30

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.

  13. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOE Patents [OSTI]

    Karasick, Michael S. (Ridgefield, CT); Strip, David R. (Albuquerque, NM)

    1996-01-01

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.

  14. Method and apparatus for obtaining stack traceback data for multiple computing nodes of a massively parallel computer system

    DOE Patents [OSTI]

    Gooding, Thomas Michael (Rochester, MN); McCarthy, Patrick Joseph (Rochester, MN)

    2010-03-02

    A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.

  15. Massively parallel processing on the Intel Paragon system: One tool in achieving the goals of the Human Genome Project

    SciTech Connect (OSTI)

    Ecklund, D.J.

    1993-12-31

    A massively parallel computing system is one tool that has been adopted by researchers in the Human Genome Project. This tool is one of many in a toolbox of theories, algorithms, and systems that are used to attack the many questions posed by the project. A good tool functions well when applied alone to the problem for which it was devised. A superior tool achieves its solitary goal, and supports and interacts with other tools to achieve goals beyond the scope of any individual tool. The author believes that Intel`s massively parallel Paragon{trademark} XP/S system is a superior tool. This paper presents specific requirements for a superior computing tool for the Human Genome Project (HGP) and shows how the Paragon system addresses these requirements. Computing requirements for HGP are based on three factors: (1) computing requirements of algorithms currently used in sequence homology, protein folding, and database insertion/retrieval; (2) estimates of the computing requirements of new applications arising from evolving biological theories; and (3) the requirements for facilities that support collaboration among scientists in a project of this magnitude. The Paragon system provides many hardware and software features that effectively address these requirements.

  16. Analysis and selection of optimal function implementations in massively parallel computer

    DOE Patents [OSTI]

    Archer, Charles Jens (Rochester, MN); Peters, Amanda (Rochester, MN); Ratterman, Joseph D. (Rochester, MN)

    2011-05-31

    An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.

  17. Massively-parallel electron dynamics calculations in real-time and real-space: Toward applications to nanostructures of more than ten-nanometers in size

    SciTech Connect (OSTI)

    Noda, Masashi; Ishimura, Kazuya; Nobusada, Katsuyuki; Yabana, Kazuhiro; Boku, Taisuke

    2014-05-15

    A highly efficient program of massively parallel calculations for electron dynamics has been developed in an effort to apply the method to optical response of nanostructures of more than ten-nanometers in size. The approach is based on time-dependent density functional theory calculations in real-time and real-space. The computational code is implemented by using simple algorithms with a finite-difference method in space derivative and Taylor expansion in time-propagation. Since the computational program is free from the algorithms of eigenvalue problems and fast-Fourier-transformation, which are usually implemented in conventional quantum chemistry or band structure calculations, it is highly suitable for massively parallel calculations. Benchmark calculations using the K computer at RIKEN demonstrate that the parallel efficiency of the program is very high on more than 60 000 CPU cores. The method is applied to optical response of arrays of C{sub 60} orderly nanostructures of more than 10 nm in size. The computed absorption spectrum is in good agreement with the experimental observation.

  18. Method and apparatus for analyzing error conditions in a massively parallel computer system by identifying anomalous nodes within a communicator set

    DOE Patents [OSTI]

    Gooding, Thomas Michael (Rochester, MN)

    2011-04-19

    An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group.

  19. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by employing bandwidth shells at areas of overutilization

    DOE Patents [OSTI]

    Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

    2010-04-27

    A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a final destination. The default routing strategy is altered responsive to detection of overutilization of a particular path of one or more links, and at least some traffic is re-routed by distributing the traffic among multiple paths (which may include the default path). An alternative path may require a greater number of link traversals to reach the destination node.

  20. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by dynamic global mapping of contended links

    DOE Patents [OSTI]

    Archer, Charles Jens (Rochester, MN); Musselman, Roy Glenn (Rochester, MN); Peters, Amanda (Rochester, MN); Pinnow, Kurt Walter (Rochester, MN); Swartz, Brent Allen (Chippewa Falls, WI); Wallenfelt, Brian Paul (Eden Prairie, MN)

    2011-10-04

    A massively parallel nodal computer system periodically collects and broadcasts usage data for an internal communications network. A node sending data over the network makes a global routing determination using the network usage data. Preferably, network usage data comprises an N-bit usage value for each output buffer associated with a network link. An optimum routing is determined by summing the N-bit values associated with each link through which a data packet must pass, and comparing the sums associated with different possible routes.

  1. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by routing through transporter nodes

    DOE Patents [OSTI]

    Archer, Charles Jens (Rochester, MN); Musselman, Roy Glenn (Rochester, MN); Peters, Amanda (Rochester, MN); Pinnow, Kurt Walter (Rochester, MN); Swartz, Brent Allen (Chippewa Falls, WI); Wallenfelt, Brian Paul (Eden Prairie, MN)

    2010-11-16

    A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a destination. Some packets are constrained to be routed through respective designated transporter nodes, the automated routing strategy determining a path from a respective source node to a respective transporter node, and from a respective transporter node to a respective destination node. Preferably, the source node chooses a routing policy from among multiple possible choices, and that policy is followed by all intermediate nodes. The use of transporter nodes allows greater flexibility in routing.

  2. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by dynamically adjusting local routing strategies

    DOE Patents [OSTI]

    Archer, Charles Jens (Rochester, MN); Musselman, Roy Glenn (Rochester, MN); Peters, Amanda (Rochester, MN); Pinnow, Kurt Walter (Rochester, MN); Swartz, Brent Allen (Chippewa Falls, WI); Wallenfelt, Brian Paul (Eden Prairie, MN)

    2010-03-16

    A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Each node implements a respective routing strategy for routing data through the network, the routing strategies not necessarily being the same in every node. The routing strategies implemented in the nodes are dynamically adjusted during application execution to shift network workload as required. Preferably, adjustment of routing policies in selective nodes is performed at synchronization points. The network may be dynamically monitored, and routing strategies adjusted according to detected network conditions.

  3. User's guide of TOUGH2-EGS-MP: A Massively Parallel Simulator with Coupled Geomechanics for Fluid and Heat Flow in Enhanced Geothermal Systems VERSION 1.0

    SciTech Connect (OSTI)

    Xiong, Yi; Fakcharoenphol, Perapon; Wang, Shihao; Winterfeld, Philip H.; Zhang, Keni; Wu, Yu-Shu

    2013-12-01

    TOUGH2-EGS-MP is a parallel numerical simulation program coupling geomechanics with fluid and heat flow in fractured and porous media, and is applicable for simulation of enhanced geothermal systems (EGS). TOUGH2-EGS-MP is based on the TOUGH2-MP code, the massively parallel version of TOUGH2. In TOUGH2-EGS-MP, the fully-coupled flow-geomechanics model is developed from linear elastic theory for thermo-poro-elastic systems and is formulated in terms of mean normal stress as well as pore pressure and temperature. Reservoir rock properties such as porosity and permeability depend on rock deformation, and the relationships between these two, obtained from poro-elasticity theories and empirical correlations, are incorporated into the simulation. This report provides the user with detailed information on the TOUGH2-EGS-MP mathematical model and instructions for using it for Thermal-Hydrological-Mechanical (THM) simulations. The mathematical model includes the fluid and heat flow equations, geomechanical equation, and discretization of those equations. In addition, the parallel aspects of the code, such as domain partitioning and communication between processors, are also included. Although TOUGH2-EGS-MP has the capability for simulating fluid and heat flows coupled with geomechanical effects, it is up to the user to select the specific coupling process, such as THM or only TH, in a simulation. There are several example problems illustrating applications of this program. These example problems are described in detail and their input data are presented. Their results demonstrate that this program can be used for field-scale geothermal reservoir simulation in porous and fractured media with fluid and heat flow coupled with geomechanical effects.

  4. CX-001635: Categorical Exclusion Determination

    Broader source: Energy.gov [DOE]

    Solar American Institute Incubator - Semprius - Massively Parallel Microcell-Based Module ArrayCX(s) Applied: B3.6Date: 04/08/2010Location(s): Durham, North CarolinaOffice(s): Energy Efficiency and Renewable Energy, Golden Field Office

  5. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by semi-randomly varying routing policies for different packets

    DOE Patents [OSTI]

    Archer, Charles Jens (Rochester, MN); Musselman, Roy Glenn (Rochester, MN); Peters, Amanda (Rochester, MN); Pinnow, Kurt Walter (Rochester, MN); Swartz, Brent Allen (Chippewa Falls, WI); Wallenfelt, Brian Paul (Eden Prairie, MN)

    2010-11-23

    A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Nodes vary a choice of routing policy for routing data in the network in a semi-random manner, so that similarly situated packets are not always routed along the same path. Semi-random variation of the routing policy tends to avoid certain local hot spots of network activity, which might otherwise arise using more consistent routing determinations. Preferably, the originating node chooses a routing policy for a packet, and all intermediate nodes in the path route the packet according to that policy. Policies may be rotated on a round-robin basis, selected by generating a random number, or otherwise varied.

  6. Parallel computing works

    SciTech Connect (OSTI)

    Not Available

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  7. High-speed massively parallel scanning

    DOE Patents [OSTI]

    Decker, Derek E. (Byron, CA)

    2010-07-06

    A new technique for recording a series of images of a high-speed event (such as, but not limited to: ballistics, explosives, laser induced changes in materials, etc.) is presented. Such technique(s) makes use of a lenslet array to take image picture elements (pixels) and concentrate light from each pixel into a spot that is much smaller than the pixel. This array of spots illuminates a detector region (e.g., film, as one embodiment) which is scanned transverse to the light, creating tracks of exposed regions. Each track is a time history of the light intensity for a single pixel. By appropriately configuring the array of concentrated spots with respect to the scanning direction of the detection material, different tracks fit between pixels and sufficient lengths are possible which can be of interest in several high-speed imaging applications.

  8. Parallel Dislocation Simulator

    Energy Science and Technology Software Center (OSTI)

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  9. Differences Between Distributed and Parallel Systems

    SciTech Connect (OSTI)

    Brightwell, R.; Maccabe, A.B.; Rissen, R.

    1998-10-01

    Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.

  10. Ultrascalable petaflop parallel supercomputer

    DOE Patents [OSTI]

    Blumrich, Matthias A. (Ridgefield, CT); Chen, Dong (Croton On Hudson, NY); Chiu, George (Cross River, NY); Cipolla, Thomas M. (Katonah, NY); Coteus, Paul W. (Yorktown Heights, NY); Gara, Alan G. (Mount Kisco, NY); Giampapa, Mark E. (Irvington, NY); Hall, Shawn (Pleasantville, NY); Haring, Rudolf A. (Cortlandt Manor, NY); Heidelberger, Philip (Cortlandt Manor, NY); Kopcsay, Gerard V. (Yorktown Heights, NY); Ohmacht, Martin (Yorktown Heights, NY); Salapura, Valentina (Chappaqua, NY); Sugavanam, Krishnan (Mahopac, NY); Takken, Todd (Brewster, NY)

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  11. TRANSIMS Parallelization

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    TRANSIMS Parallelization This email address is being protected from spambots. You need JavaScript enabled to view it. - TRACC Director This email address is being protected from spambots. You need JavaScript enabled to view it. - Associate Computational Transportation Engineer Background TRANSIMS was originally developed by Los Alamos National Laboratory to run exclusively on a Linux cluster environment. In this initial version, the only parallelized component was the microsimulator. It worked

  12. RECIPIENT:Semprius

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    Semprius u.s. DEPARTr-IEN T OF ENERGY EERE PROJECT MAN AGEMEN T CENT ER NEPA DETERMINATION Page 1 of2 STATE: NC PROJECT TITLE: SAl Incubator - Semprius - Massively Parallel Microcell-based Module Array; NREl Tracking No. 09- 036a Funding Opportunity Announcement Number Procur~mtnt Instrument Number NEPA Control Number elD Number NREL-09-036a G010337 Based on my review of the information concuning the proposed action, as Nf:PA Compliance Officer (authorized undcr DOE Order 45 I. IA), I have made

  13. Supertwistors and massive particles

    SciTech Connect (OSTI)

    Mezincescu, Luca; Routh, Alasdair J.; Townsend, Paul K.

    2014-07-15

    In the (super)twistor formulation of massless (super)particle mechanics, the mass-shell constraint is replaced by a spin-shell constraint from which the spin content can be read off. We extend this formalism to massive (super)particles (with N-extended spacetime supersymmetry) in three and four spacetime dimensions, explaining how the spin-shell constraints are related to spin, and we use it to prove equivalence of the massive N=1 and BPS-saturated N=2 superparticle actions. We also find the supertwistor form of the action for spinning particles with N-extended worldline supersymmetry, massless in four dimensions and massive in three dimensions, and we show how this simplifies special features of the N=2 case. -- Highlights: Spin-shell constraints are related to Poincar Casimirs. Twistor form of 4D spinning particle for spin N/2. Twistor proof of scalar/antisymmetric tensor equivalence for 4D spin 0. Twistor form of 3D particle with arbitrary spin. Proof of equivalence of N=1 and N=2 BPS massive 4D superparticles.

  14. Special parallel processing workshop

    SciTech Connect (OSTI)

    1994-12-01

    This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.

  15. Parallel Batch Scripts

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    parallel environments. Basic Parallel Example If your job requires the default 5GB of memory per slot, you can do the following: binbash Set SGE options: -- ensure...

  16. Parallel Python GDB

    Energy Science and Technology Software Center (OSTI)

    2012-08-05

    PGDB is a lightweight parallel debugger softrware product. It utilizes the open souce gdb debugger inside of a parallel python framework.

  17. A Comprehensive Look at High Performance Parallel I/O

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    A Comprehensive Look at High Performance Parallel I/O A Comprehensive Look at High Performance Parallel I/O Book Signing @ SC14! Nov. 18, 5 p.m. in Booth 1939 November 10, 2014 Contact: Linda Vu, +1 510 495 2402, lvu@lbl.gov HighPerf Parallel IO In the 1990s, high performance computing (HPC) made a dramatic transition to massively parallel processors. As this model solidified over the next 20 years, supercomputing performance increased from gigaflops-billions of calculations per second-to

  18. Parallel flow diffusion battery

    DOE Patents [OSTI]

    Yeh, H.C.; Cheng, Y.S.

    1984-01-01

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  19. Parallel flow diffusion battery

    DOE Patents [OSTI]

    Yeh, Hsu-Chi (Albuquerque, NM); Cheng, Yung-Sung (Albuquerque, NM)

    1984-08-07

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  20. Parallel integrated thermal management

    DOE Patents [OSTI]

    Bennion, Kevin; Thornton, Matthew

    2014-08-19

    Embodiments discussed herein are directed to managing the heat content of two vehicle subsystems through a single coolant loop having parallel branches for each subsystem.

  1. UPC (Unified Parallel C)

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    C) Description Unified Parallel C is a partitioned global address space (PGAS) language and an extension of the C programming language. Availability UPC is available on...

  2. Eclipse Parallel Tools Platform

    Energy Science and Technology Software Center (OSTI)

    2005-02-18

    Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices,more » and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures, and basis Fortran integration. Future versions will extend the functionality substantially, provide a number of core parallel tools, and provide support across a wide rang of parallel architectures and languages.« less

  3. Parallel programming with PCN

    SciTech Connect (OSTI)

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  4. Thought Leaders during Crises in Massive Social Networks

    SciTech Connect (OSTI)

    Corley, Courtney D.; Farber, Robert M.; Reynolds, William

    2012-05-24

    The vast amount of social media data that can be gathered from the internet coupled with workflows that utilize both commodity systems and massively parallel supercomputers, such as the Cray XMT, open new vistas for research to support health, defense, and national security. Computer technology now enables the analysis of graph structures containing more than 4 billion vertices joined by 34 billion edges along with metrics and massively parallel algorithms that exhibit near-linear scalability according to number of processors. The challenge lies in making this massive data and analysis comprehensible to an analyst and end-users that require actionable knowledge to carry out their duties. Simply stated, we have developed language and content agnostic techniques to reduce large graphs built from vast media corpora into forms people can understand. Specifically, our tools and metrics act as a survey tool to identify thought leaders' -- those members that lead or reflect the thoughts and opinions of an online community, independent of the source language.

  5. Parallel time integration software

    Energy Science and Technology Software Center (OSTI)

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds mustmore » come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.« less

  6. Parallel optical sampler

    DOE Patents [OSTI]

    Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A

    2014-05-20

    An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.

  7. Parallel Tensor Compression for Large-Scale Scientific Data.

    SciTech Connect (OSTI)

    Kolda, Tamara G.; Ballard, Grey; Austin, Woody Nathan

    2015-10-01

    As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.

  8. Parallel programming with Ada

    SciTech Connect (OSTI)

    Kok, J.

    1988-01-01

    To the human programmer the ease of coding distributed computing is highly dependent on the suitability of the employed programming language. But with a particular language it is also important whether the possibilities of one or more parallel architectures can efficiently be addressed by available language constructs. In this paper the possibilities are discussed of the high-level language Ada and in particular of its tasking concept as a descriptional tool for the design and implementation of numerical and other algorithms that allow execution of parts in parallel. Language tools are explained and their use for common applications is shown. Conclusions are drawn about the usefulness of several Ada concepts.

  9. Applications of Parallel Computers

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Computers Applications of Parallel Computers UCB CS267 Spring 2015 Tuesday & Thursday, 9:30-11:00 Pacific Time Applications of Parallel Computers, CS267, is a graduate-level course offered at the University of California, Berkeley. The course is being taught by UC Berkeley professor and LBNL Faculty Scientist Jim Demmel. CS267 is broadcast live over the internet and all NERSC users are invited to monitor the broadcast course, but course credit is available only to student registered for the

  10. Parallel Total Energy

    Energy Science and Technology Software Center (OSTI)

    2004-10-21

    This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.

  11. Parallel Multigrid Equation Solver

    Energy Science and Technology Software Center (OSTI)

    2001-09-07

    Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.

  12. Parallel programming with PCN

    SciTech Connect (OSTI)

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  13. Parallel grid population

    DOE Patents [OSTI]

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  14. Xyce parallel electronic simulator.

    SciTech Connect (OSTI)

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  15. Exploiting Network Parallelism

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Exploiting Network Parallelism for Improving Data Transfer Performance Dan Gunter ∗ , Raj Kettimuthu † , Ezra Kissel ‡ , Martin Swany ‡ , Jun Yi § , Jason Zurawski ¶ ∗ Advanced Computing for Science Department, Lawrence Berkeley National Laboratory, Berkeley, CA † Mathematics and Computer Science Division, Argonne National Laboratory Argonne, IL ‡ School of Informatics and Computing, Indiana University, Bloomington, IN § Computation Institute, University of Chicago/Argonne

  16. Allinea DDT as a Parallel Debugging Alternative to Totalview

    SciTech Connect (OSTI)

    Antypas, K.B.

    2007-03-05

    Totalview, from the Etnus Corporation, is a sophisticated and feature rich software debugger for parallel applications. As Totalview has gained in popularity and market share its pricing model has increased to the point where it is often prohibitively expensive for massively parallel supercomputers. Additionally, many of Totalview's advanced features are not used by members of the scientific computing community. For these reasons, supercomputing centers have begun to search for a basic parallel debugging tool which can be used as an alternative to Totalview. As the cost and complexity of Totalview has increased over the years, scientific computing centers have started searching for a viable parallel debugging alternative. DDT (Distributed Debugging Tool) from Allinea Software is a relatively new parallel debugging tool which aims to provide much of the same functionality as Totalview. This review outlines the basic features and limitations of DDT to determine if it can be a reasonable substitute for Totalview. DDT was tested on the NERSC platforms Bassi, Seaborg, Jacquard and Davinci with Fortran90, C, and C++ codes using MPI and OpenMP for parallelism.

  17. Parallel ptychographic reconstruction

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-12-19

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps tomoretake in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.less

  18. Parallel ptychographic reconstruction

    SciTech Connect (OSTI)

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-12-19

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.

  19. Unified Parallel Software

    Energy Science and Technology Software Center (OSTI)

    2003-12-01

    UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use ofmore » EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.« less

  20. A Massive Stellar Burst Before the Supernova

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    A Massive Stellar Burst Before the Supernova A Massive Stellar Burst Before the Supernova February 6, 2013 Contact: Linda Vu, lvu@lbl.gov, +1 510 495 2402 An automated supernova hunt is shedding new light on the death sequence of massive stars-specifically, the kind that self-destruct in Type IIn supernova explosions. Digging through the Palomar Transient Factory (PTF) data archive housed at the Department of Energy's National Energy Research Scientific Computing Center (NERSC) at Lawrence

  1. Parallel Computing Summer Research Internship

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel Computing Parallel Computing Summer Research Internship Creates next-generation leaders in HPC research and applications development Contacts Program Co-Lead Robert (Bob) Robey Email Program Co-Lead Gabriel Rockefeller Email Program Co-Lead Hai Ah Nam Email Professional Staff Assistant Nickole Aguilar Garcia (505) 665-3048 Email The Parallel Computing Summer Research Internship is an intense 10 week program aimed at providing students with a solid foundation in modern high performance

  2. BlueGene/L Applications: Parallelism on a Massive Scale (Journal...

    Office of Scientific and Technical Information (OSTI)

    131,072 processors and absolute performance with a peak rate of 367 TFlops. BGL has led the Top500 list the last four times with a Linpack rate of 280.6 TFlops for the full ...

  3. A new quasidilaton theory of massive gravity (Journal Article...

    Office of Scientific and Technical Information (OSTI)

    A new quasidilaton theory of massive gravity Citation Details In-Document Search Title: A new quasidilaton theory of massive gravity We present a new quasidilaton theory of...

  4. Massive Hanford Test Reactor Removed - Plutonium Recycle Test...

    Office of Environmental Management (EM)

    Massive Hanford Test Reactor Removed - Plutonium Recycle Test Reactor removed from Hanford's 300 Area Massive Hanford Test Reactor Removed - Plutonium Recycle Test Reactor removed ...

  5. SDSS-III: Massive Spectroscopic Surveys of the Distant Universe...

    Office of Scientific and Technical Information (OSTI)

    Massive Spectroscopic Surveys of the Distant Universe, the Milky Way Galaxy, and Extra-Solar Planetary Systems Citation Details In-Document Search Title: SDSS-III: Massive...

  6. A two-level parallel direct search implementation for arbitrarily sized objective functions

    SciTech Connect (OSTI)

    Hutchinson, S.A.; Shadid, N.; Moffat, H.K.

    1994-12-31

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  7. A two-level parallel direct search implementation for arbitrarily sized objective functions

    SciTech Connect (OSTI)

    Hutchinson, S.A.; Shadid, J.N.; Moffat, H.K.; Ng, K.T.

    1994-02-21

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine, itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p2) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  8. Dynamic Star Formation in the Massive DR21 Filament

    SciTech Connect (OSTI)

    Schneider, N.; Csengeri, T.; Bontemps, S.; Motte, F.; Simon, R.; Hennebelle, P.; Federrath, C.; Klessen, R.; /ZAH, Heidelberg /KIPAC, Menlo Park

    2010-08-25

    The formation of massive stars is a highly complex process in which it is unclear whether the star-forming gas is in global gravitational collapse or an equilibrium state supported by turbulence and/or magnetic fields. By studying one of the most massive and dense star-forming regions in the Galaxy at a distance of less than 3 kpc, i.e. the filament containing the well-known sources DR21 and DR21(OH), we attempt to obtain observational evidence to help us to discriminate between these two views. We use molecular line data from our {sup 13}CO 1 {yields} 0, CS 2 {yields} 1, and N{sub 2}H{sup +} 1 {yields} 0 survey of the Cygnus X region obtained with the FCRAO and CO, CS, HCO{sup +}, N{sub 2}H{sup +}, and H{sub 2}CO data obtained with the IRAM 30m telescope. We observe a complex velocity field and velocity dispersion in the DR21 filament in which regions of the highest column-density, i.e., dense cores, have a lower velocity dispersion than the surrounding gas and velocity gradients that are not (only) due to rotation. Infall signatures in optically thick line profiles of HCO{sup +} and {sup 12}CO are observed along and across the whole DR21 filament. By modelling the observed spectra, we obtain a typical infall speed of {approx}0.6 km s{sup -1} and mass accretion rates of the order of a few 10{sup -3} M{sub {circle_dot}} yr{sup -1} for the two main clumps constituting the filament. These massive clumps (4900 and 3300 M{sub {circle_dot}} at densities of around 10{sup 5} cm{sup -3} within 1 pc diameter) are both gravitationally contracting. The more massive of the clumps, DR21(OH), is connected to a sub-filament, apparently 'falling' onto the clump. This filament runs parallel to the magnetic field. Conclusions. All observed kinematic features in the DR21 filament (velocity field, velocity dispersion, and infall), its filamentary morphology, and the existence of (a) sub-filament(s) can be explained if the DR21 filament was formed by the convergence of flows on large scales and is now in a state of global gravitational collapse. Whether this convergence of flows originated from self-gravity on larger scales or from other processes cannot be determined by the present study. The observed velocity field and velocity dispersion are consistent with results from (magneto)-hydrodynamic simulations where the cores lie at the stagnation points of convergent turbulent flows.

  9. Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P

    SciTech Connect (OSTI)

    Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Syratchev, I.; /CERN

    2009-06-19

    In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).

  10. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  11. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  12. Computing contingency statistics in parallel.

    SciTech Connect (OSTI)

    Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

    2010-09-01

    Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.

  13. Growth histories in bimetric massive gravity

    SciTech Connect (OSTI)

    Berg, Marcus; Buchberger, Igor; Enander, Jonas; Mrtsell, Edvard; Sjrs, Stefan E-mail: igor.buchberger@kau.se E-mail: edvard@fysik.su.se

    2012-12-01

    We perform cosmological perturbation theory in Hassan-Rosen bimetric gravity for general homogeneous and isotropic backgrounds. In the de Sitter approximation, we obtain decoupled sets of massless and massive scalar gravitational fluctuations. Matter perturbations then evolve like in Einstein gravity. We perturb the future de Sitter regime by the ratio of matter to dark energy, producing quasi-de Sitter space. In this more general setting the massive and massless fluctuations mix. We argue that in the quasi-de Sitter regime, the growth of structure in bimetric gravity differs from that of Einstein gravity.

  14. NON-AQUEOUS DISSOLUTION OF MASSIVE PLUTONIUM

    DOE Patents [OSTI]

    Reavis, J.G.; Leary, J.A.; Walsh, K.A.

    1959-05-12

    A method is presented for obtaining non-aqueous solutions or plutonium from massive forms of the metal. In the present invention massive plutonium is added to a salt melt consisting of 10 to 40 weight per cent of sodium chloride and the balance zinc chloride. The plutonium reacts at about 800 deg C with the zinc chloride to form a salt bath of plutonium trichloride, sodium chloride, and metallic zinc. The zinc is separated from the salt melt by forcing the molten mixture through a Pyrex filter.

  15. Primordial Li abundance and massive particles

    SciTech Connect (OSTI)

    Latin-Capital-Letter-Eth apo, H.

    2012-10-20

    The problem of the observed lithium abundance coming from the Big Bang Nucleosynthesis is as of yet unsolved. One of the proposed solutions is including relic massive particles into the Big Bang Nucleosynthesis. We investigated the effects of such particles on {sup 4}HeX{sup -}+{sup 2}H{yields}{sup 6}Li+X{sup -}, where the X{sup -} is the negatively charged massive particle. We demonstrate the dominance of long-range part of the potential on the cross-section.

  16. Parallelization

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    to equa- tion (1.14) can be written for the term of J B. The finite-difference approxi- mation for the radial derivative in this equation is then B z r i ...

  17. Designing a parallel simula machine

    SciTech Connect (OSTI)

    Papazoglou, M.P.; Georgiadis, P.I.; Maritsas, D.G.

    1983-10-01

    The parallel simula machine (PSM) architecture is based upon a master/slave topology, incorporating a master microprocessor. Interconnection circuitry between the master and slave processor modules uses a timesharing system bus and various programmable interrupt control units. Common and private memory modules reside in the PSM, and direct memory access transfers ease the master processor's workload. 5 references.

  18. Parallel Power Grid Simulation Toolkit

    Energy Science and Technology Software Center (OSTI)

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  19. Search for massive resonances in dijet systems containing jets...

    Office of Scientific and Technical Information (OSTI)

    massive resonances in dijet systems containing jets tagged as W or Z boson decays in pp collisions at ?s 8 TeV Re-direct Destination: Search for massive resonances in dijet...

  20. Scientists say climate change could cause a 'massive' tree die...

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Climate change could cause a 'massive' tree die-off in the U.S. Southwest Scientists say climate change could cause a 'massive' tree die-off in the U.S. Southwest In a troubling...

  1. Parallelization and checkpointing of GPU applications through program transformation

    SciTech Connect (OSTI)

    Solano-Quinde, Lizandro Dami#19;an

    2012-11-15

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have beneffited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solve the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.

  2. Parallel continuation-based global optimization for molecular conformation and protein folding

    SciTech Connect (OSTI)

    Coleman, T.F.; Wu, Z. [Cornell Univ., Ithaca, NY (United States)

    1994-12-31

    This paper presents the authors` recent work on developing parallel algorithms and software for solving the global minimization problem for molecular conformation, especially protein folding. Global minimization problems are difficult to solve when the objective functions have many local minimizers, such as the energy functions for protein folding. In their approach, to avoid directly minimizing a ``difficult`` function, a special integral transformation is introduced to transform the function into a class of gradually deformed, but ``smoother`` or ``easier`` functions. An optimization procedure is then applied to the new functions successively, to trace their solutions back to the original function. The method can be applied to a large class of nonlinear partially separable functions including energy functions for molecular conformation and protein folding. Mathematical theory for the method, as a special continuation approach to global optimization, is established. Algorithms with different solution tracing strategies are developed. Different levels of parallelism are exploited for the implementation of the algorithms on massively parallel architectures.

  3. Cosmology in general massive gravity theories

    SciTech Connect (OSTI)

    Comelli, D.; Nesti, F.; Pilo, L. E-mail: fabrizio.nesti@aquila.infn.it

    2014-05-01

    We study the cosmological FRW flat solutions generated in general massive gravity theories. Such a model are obtained adding to the Einstein General Relativity action a peculiar non derivative potentials, function of the metric components, that induce the propagation of five gravitational degrees of freedom. This large class of theories includes both the case with a residual Lorentz invariance as well as the case with rotational invariance only. It turns out that the Lorentz-breaking case is selected as the only possibility. Moreover it turns out that that perturbations around strict Minkowski or dS space are strongly coupled. The upshot is that even though dark energy can be simply accounted by massive gravity modifications, its equation of state w{sub eff} has to deviate from -1. Indeed, there is an explicit relation between the strong coupling scale of perturbations and the deviation of w{sub eff} from -1. Taking into account current limits on w{sub eff} and submillimiter tests of the Newton's law as a limit on the possible strong coupling scale, we find that it is still possible to have a weakly coupled theory in a quasi dS background. Future experimental improvements on short distance tests of the Newton's law may be used to tighten the deviation of w{sub eff} form -1 in a weakly coupled massive gravity theory.

  4. Optimize Parallel Pumping Systems | Department of Energy

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    Optimize Parallel Pumping Systems Optimize Parallel Pumping Systems This tip sheet describes how to optimize the performance of multiple pumps operating continuously as part of a parallel pumping system. PUMPING SYSTEMS TIP SHEET #8 PDF icon Optimize Parallel Pumping Systems (October 2006) More Documents & Publications Select an Energy-Efficient Centrifugal Pump Match Pumps to System Requirements Improving Pumping System Performance: A Sourcebook for Industry - Second Edition

  5. Xyce parallel electronic simulator design.

    SciTech Connect (OSTI)

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  6. A Novel Application of Parallel Betweenness Centrality to Power Grid Contingency Analysis

    SciTech Connect (OSTI)

    Jin, Shuangshuang; Huang, Zhenyu; Chen, Yousu; Chavarra-Miranda, Daniel; Feo, John T.; Wong, Pak C.

    2010-04-19

    In Energy Management Systems, contingency analysis is commonly performed for identifying and mitigating potentially harmful power grid component failures. The exponentially increasing combinatorial number of failure modes imposes a significant computational burden for massive contingency analysis. It is critical to select a limited set of high-impact contingency cases within the constraint of computing power and time requirements to make it possible for real-time power system vulnerability assessment. In this paper, we present a novel application of parallel betweenness centrality to power grid contingency selection. We cross-validate the proposed method using the model and data of the western US power grid, and implement it on a Cray XMT system - a massively multithreaded architecture - leveraging its advantages for parallel execution of irregular algorithms, such as graph analysis. We achieve a speedup of 55 times (on 64 processors) compared against the single-processor version of the same code running on the Cray XMT. We also compare an OpenMP-based version of the same code running on an HP Superdome shared-memory machine. The performance of the Cray XMT code shows better scalability and resource utilization, and shorter execution time for large-scale power grids. This proposed approach has been evaluated in PNNLs Electricity Infrastructure Operations Center (EIOC). It is expected to provide a quick and efficient solution to massive contingency selection problems to help power grid operators to identify and mitigate potential widespread cascading power grid failures in real time.

  7. A new quasidilaton theory of massive gravity

    SciTech Connect (OSTI)

    Mukohyama, Shinji

    2014-12-01

    We present a new quasidilaton theory of Poincare invariant massive gravity, based on the recently proposed framework of matter coupling that makes it possible for the kinetic energy of the quasidilaton scalar to couple to both physical and fiducial metrics simultaneously. We find a scaling-type exact solution that expresses a self-accelerating de Sitter universe, and then analyze linear perturbations around it. It is shown that in a range of parameters all physical degrees of freedom have non-vanishing quadratic kinetic terms and are stable in the subhorizon limit, while the effective Newton's constant for the background is kept positive.

  8. Device for balancing parallel strings

    DOE Patents [OSTI]

    Mashikian, Matthew S. (Storrs, CT)

    1985-01-01

    A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.

  9. Hybrid Optimization Parallel Search PACKage

    Energy Science and Technology Software Center (OSTI)

    2009-11-10

    HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework providesmore » a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.« less

  10. Information hiding in parallel programs

    SciTech Connect (OSTI)

    Foster, I.

    1992-01-30

    A fundamental principle in program design is to isolate difficult or changeable design decisions. Application of this principle to parallel programs requires identification of decisions that are difficult or subject to change, and the development of techniques for hiding these decisions. We experiment with three complex applications, and identify mapping, communication, and scheduling as areas in which decisions are particularly problematic. We develop computational abstractions that hide such decisions, and show that these abstractions can be used to develop elegant solutions to programming problems. In particular, they allow us to encode common structures, such as transforms, reductions, and meshes, as software cells and templates that can reused in different applications. An important characteristic of these structures is that they do not incorporate mapping, communication, or scheduling decisions: these aspects of the design are specified separately, when composing existing structures to form applications. This separation of concerns allows the same cells and templates to be reused in different contexts.

  11. Dipolar dark matter with massive bigravity

    SciTech Connect (OSTI)

    Blanchet, Luc; Heisenberg, Lavinia

    2015-12-14

    Massive gravity theories have been developed as viable IR modifications of gravity motivated by dark energy and the problem of the cosmological constant. On the other hand, modified gravity and modified dark matter theories were developed with the aim of solving the problems of standard cold dark matter at galactic scales. Here we propose to adapt the framework of ghost-free massive bigravity theories to reformulate the problem of dark matter at galactic scales. We investigate a promising alternative to dark matter called dipolar dark matter (DDM) in which two different species of dark matter are separately coupled to the two metrics of bigravity and are linked together by an internal vector field. We show that this model successfully reproduces the phenomenology of dark matter at galactic scales (i.e. MOND) as a result of a mechanism of gravitational polarisation. The model is safe in the gravitational sector, but because of the particular couplings of the matter fields and vector field to the metrics, a ghost in the decoupling limit is present in the dark matter sector. However, it might be possible to push the mass of the ghost beyond the strong coupling scale by an appropriate choice of the parameters of the model. Crucial questions to address in future work are the exact mass of the ghost, and the cosmological implications of the model.

  12. Knowledge Discovery from Massive Healthcare Claims Data

    SciTech Connect (OSTI)

    Chandola, Varun; Sukumar, Sreenivas R; Schryver, Jack C

    2013-01-01

    The role of big data in addressing the needs of the present healthcare system in US and rest of the world has been echoed by government, private, and academic sectors. There has been a growing emphasis to explore the promise of big data analytics in tapping the potential of the massive healthcare data emanating from private and government health insurance providers. While the domain implications of such collaboration are well known, this type of data has been explored to a limited extent in the data mining community. The objective of this paper is two fold: first, we introduce the emerging domain of big"healthcare claims data to the KDD community, and second, we describe the success and challenges that we encountered in analyzing this data using state of art analytics for massive data. Specically, we translate the problem of analyzing healthcare data into some of the most well-known analysis problems in the data mining community, social network analysis, text mining, and temporal analysis and higher order feature construction, and describe how advances within each of these areas can be leveraged to understand the domain of healthcare. Each case study illustrates a unique intersection of data mining and healthcare with a common objective of improving the cost-care ratio by mining for opportunities to improve healthcare operations and reducing hat seems to fall under fraud, waste,and abuse.

  13. Pair instability supernovae of very massive population III stars (Journal

    Office of Scientific and Technical Information (OSTI)

    Article) | SciTech Connect Pair instability supernovae of very massive population III stars Citation Details In-Document Search Title: Pair instability supernovae of very massive population III stars Numerical studies of primordial star formation suggest that the first stars in the universe may have been very massive. Stellar models indicate that non-rotating Population III stars with initial masses of 140-260 M {sub ☉} die as highly energetic pair-instability supernovae. We present new

  14. Parallel Programming and Optimization for Intel Architecture

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel Programming and Optimization for Intel Architecture Parallel Programming and Optimization for Intel Architecture August 14, 2015 by Richard Gerber Intel is sponsoring a series of webinars entitled "Parallel Programming and Optimization for Intel Architecture." Here's the schedule for August (Registration link is: https://attendee.gotowebinar.com/register/6325131222429932289) Mon, August 17 - "Hello world from Intel Xeon Phi coprocessors". Overview of architecture,

  15. Redshift-space distortions in massive neutrino and evolving dark...

    Office of Scientific and Technical Information (OSTI)

    Title: Redshift-space distortions in massive neutrino and evolving dark energy cosmologies Authors: Upadhye, Amol ; Kwan, Juliana ; Pope, Adrian ; Heitmann, Katrin ; Habib, Salman ...

  16. Spectral function of a fermion coupled with a massive vector...

    Office of Scientific and Technical Information (OSTI)

    temperature in a gauge invariant formalism Citation Details In-Document Search Title: Spectral function of a fermion coupled with a massive vector boson at finite temperature in ...

  17. Parallel auto-correlative statistics with VTK.

    SciTech Connect (OSTI)

    Pebay, Philippe Pierre; Bennett, Janine Camille

    2013-08-01

    This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.

  18. Broadcasting a message in a parallel computer

    DOE Patents [OSTI]

    Berg, Jeremy E. (Rochester, MN); Faraj, Ahmad A. (Rochester, MN)

    2011-08-02

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.

  19. Parallel 3D Finite Element Particle-in-Cell Simulations with Pic3P

    SciTech Connect (OSTI)

    Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Ben-Zvi, I.; Kewisch, J.; /Brookhaven

    2009-06-19

    SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic Particle-In-Cell code Pic3P. Designed for simulations of beam-cavity interactions dominated by space charge effects, Pic3P solves the complete set of Maxwell-Lorentz equations self-consistently and includes space-charge, retardation and boundary effects from first principles. Higher-order Finite Element methods with adaptive refinement on conformal unstructured meshes lead to highly efficient use of computational resources. Massively parallel processing with dynamic load balancing enables large-scale modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of next-generation accelerator facilities. Applications include the LCLS RF gun and the BNL polarized SRF gun.

  20. WCH Removes Massive Test Reactor | Department of Energy

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    WCH Removes Massive Test Reactor WCH Removes Massive Test Reactor Addthis Description Hanford's River Corridor contractor, Washington Closure Hanford, has met a significant cleanup challenge on the U.S. Department of Energy's (DOE) Hanford Site by removing a 1,082-ton nuclear test reactor from the 300 Area

  1. Xyce parallel electronic simulator : users' guide.

    SciTech Connect (OSTI)

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

  2. MACHO (MAssive Compact Halo Objects) Data

    DOE Data Explorer [Office of Scientific and Technical Information (OSTI)]

    The primary aim of the MACHO Project is to test the hypothesis that a significant fraction of the dark matter in the halo of the Milky Way is made up of objects like brown dwarfs or planets: these objects have come to be known as MACHOs, for MAssive Compact Halo Objects. The signature of these objects is the occasional amplification of the light from extragalactic stars by the gravitational lens effect. The amplification can be large, but events are extremely rare: it is necessary to monitor photometrically several million stars for a period of years in order to obtain a useful detection rate. For this purpose MACHO has a two channel system that employs eight CCDs, mounted on the 50 inch telescope at Mt. Stromlo. The high data rate (several GBytes per night) is accommodated by custom electronics and on-line data reduction. The Project has taken more than 27,000 images with this system since June 1992. Analysis of a subset of these data has yielded databases containing light curves in two colors for 8 million stars in the LMC and 10 million in the bulge of the Milky Way. A search for microlensing has turned up four candidates toward the Large Magellanic Cloud and 45 toward the Galactic Bulge. The web page for data provides links to MACHO Project data portals and various specialized interfaces for viewing or searching the data. (Specialized Interface)

  3. Dark aspects of massive spinor electrodynamics

    SciTech Connect (OSTI)

    Kim, Edward J.; Kouwn, Seyen; Oh, Phillial; Park, Chan-Gyung E-mail: seyen@ewha.ac.kr E-mail: parkc@jbnu.ac.kr

    2014-07-01

    We investigate the cosmology of massive spinor electrodynamics when torsion is non-vanishing. A non-minimal interaction is introduced between the torsion and the vector field and the coupling constant between them plays an important role in subsequential cosmology. It is shown that the mass of the vector field and torsion conspire to generate dark energy and pressureless dark matter, and for generic values of the coupling constant, the theory effectively provides an interacting model between them with an additional energy density of the form ? 1/a{sup 6}. The evolution equations mimic ?CDM behavior up to 1/a{sup 3} term and the additional term represents a deviation from ?CDM. We show that the deviation is compatible with the observational data, if it is very small. We find that the non-minimal interaction is responsible for generating an effective cosmological constant which is directly proportional to the mass squared of the vector field and the mass of the photon within its current observational limit could be the source of the dark energy.

  4. System and method for a parallel immunoassay system

    DOE Patents [OSTI]

    Stevens, Fred J. (Naperville, IL)

    2002-01-01

    A method and system for detecting a target antigen using massively parallel immunoassay technology. In this system, high affinity antibodies of the antigen are covalently linked to small beads or particles. The beads are exposed to a solution containing DNA-oligomer-mimics of the antigen. The mimics which are reactive with the covalently attached antibody or antibodies will bind to the appropriate antibody molecule on the bead. The particles or beads are then washed to remove any unbound DNA-oligomer-mimics and are then immobilized or trapped. The bead-antibody complexes are then exposed to a test solution which may contain the targeted antigens. If the antigen is present it will replace the mimic since it has a greater affinity for the respective antibody. The particles are then removed from the solution leaving a residual solution. This residual solution is applied a DNA chip containing many samples of complimentary DNA. If the DNA tag from a mimic binds with its complimentary DNA, it indicates the presence of the target antigen. A flourescent tag can be used to more easily identify the bound DNA tag.

  5. Distributed parallel messaging for multiprocessor systems

    DOE Patents [OSTI]

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  6. Parallel Climate Analysis Toolkit (ParCAT)

    Energy Science and Technology Software Center (OSTI)

    2013-06-30

    The parallel analysis toolkit (ParCAT) provides parallel statistical processing of large climate model simulation datasets. ParCAT provides parallel point-wise average calculations, frequency distributions, sum/differences of two datasets, and difference-of-average and average-of-difference for two datasets for arbitrary subsets of simulation time. ParCAT is a command-line utility that can be easily integrated in scripts or embedded in other application. ParCAT supports CMIP5 post-processed datasets as well as non-CMIP5 post-processed datasets. ParCAT reads and writes standard netCDF files.

  7. New approach for the solution of optimal control problems on parallel machines. Doctoral thesis

    SciTech Connect (OSTI)

    Stech, D.J.

    1990-01-01

    This thesis develops a highly parallel solution method for nonlinear optimal control problems. Balakrishnan's epsilon method is used in conjunction with the Rayleigh-Ritz method to convert the dynamic optimization of the optimal control problem into a static optimization problem. Walsh functions and orthogonal polynomials are used as basis functions to implement the Rayleigh-Ritz method. The resulting static optimization problem is solved using matrix operations which have well defined massively parallel solution methods. To demonstrate the method, a variety of nonlinear optimal control problems are solved. The nonlinear Raleigh problem with quadratic cost and nonlinear van der Pol problem with quadratic cost and terminal constraints on the states are solved in both serial and parallel on an eight processor Intel Hypercube. The solutions using both Walsh functions and Legendre polynomials as basis functions are given. In addition to these problems which are solved in parallel, a more complex nonlinear minimum time optimal control problem and nonlinear optimal control problem with an inequality constraint on the control are solved. Results show the method converges quickly, even from relatively poor initial guesses for the nominal trajectories.

  8. Adding Data Management Services to Parallel File Systems

    SciTech Connect (OSTI)

    Brandt, Scott

    2015-03-04

    The objective of this project, called DAMASC for “Data Management in Scientific Computing”, is to coalesce data management with parallel file system management to present a declarative interface to scientists for managing, querying, and analyzing extremely large data sets efficiently and predictably. Managing extremely large data sets is a key challenge of exascale computing. The overhead, energy, and cost of moving massive volumes of data demand designs where computation is close to storage. In current architectures, compute/analysis clusters access data in a physically separate parallel file system and largely leave it scientist to reduce data movement. Over the past decades the high-end computing community has adopted middleware with multiple layers of abstractions and specialized file formats such as NetCDF-4 and HDF5. These abstractions provide a limited set of high-level data processing functions, but have inherent functionality and performance limitations: middleware that provides access to the highly structured contents of scientific data files stored in the (unstructured) file systems can only optimize to the extent that file system interfaces permit; the highly structured formats of these files often impedes native file system performance optimizations. We are developing Damasc, an enhanced high-performance file system with native rich data management services. Damasc will enable efficient queries and updates over files stored in their native byte-stream format while retaining the inherent performance of file system data storage via declarative queries and updates over views of underlying files. Damasc has four key benefits for the development of data-intensive scientific code: (1) applications can use important data-management services, such as declarative queries, views, and provenance tracking, that are currently available only within database systems; (2) the use of these services becomes easier, as they are provided within a familiar file-based ecosystem; (3) common optimizations, e.g., indexing and caching, are readily supported across several file formats, avoiding effort duplication; and (4) performance improves significantly, as data processing is integrated more tightly with data storage. Our key contributions are: SciHadoop which explores changes to MapReduce assumption by taking advantage of semantics of structured data while preserving MapReduce’s failure and resource management; DataMods which extends common abstractions of parallel file systems so they become programmable such that they can be extended to natively support a variety of data models and can be hooked into emerging distributed runtimes such as Stanford’s Legion; and Miso which combines Hadoop and relational data warehousing to minimize time to insight, taking into account the overhead of ingesting data into data warehousing.

  9. Parallel I/O in Practice

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    art. This tutorial sheds light on the state-of-the-art in parallel IO and provides the knowledge necessary for attendees to best leverage IO resources available to them. We...

  10. Parallel programming with PCN. Revision 1

    SciTech Connect (OSTI)

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  11. Asynchronous parallel pattern search for nonlinear optimization

    SciTech Connect (OSTI)

    P. D. Hough; T. G. Kolda; V. J. Torczon

    2000-01-01

    Parallel pattern search (PPS) can be quite useful for engineering optimization problems characterized by a small number of variables (say 10--50) and by expensive objective function evaluations such as complex simulations that take from minutes to hours to run. However, PPS, which was originally designed for execution on homogeneous and tightly-coupled parallel machine, is not well suited to the more heterogeneous, loosely-coupled, and even fault-prone parallel systems available today. Specifically, PPS is hindered by synchronization penalties and cannot recover in the event of a failure. The authors introduce a new asynchronous and fault tolerant parallel pattern search (AAPS) method and demonstrate its effectiveness on both simple test problems as well as some engineering optimization problems

  12. Feature Clustering for Accelerating Parallel Coordinate Descent

    SciTech Connect (OSTI)

    Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.

    2012-12-06

    We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.

  13. PISTON (Portable Data Parallel Visualization and Analysis)

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    in a data-parallel way. By using nVidia's freely downloadable Thrust library and our own tools, we can generate executable codes for different acceleration hardware architectures...

  14. Paradyn a parallel nonlinear, explicit, three-dimensional finite-element code for solid and structural mechanics user manual

    SciTech Connect (OSTI)

    Hoover, C G; DeGroot, A J; Sherwood, R J

    2000-06-01

    ParaDyn is a parallel version of the DYNA3D computer program, a three-dimensional explicit finite-element program for analyzing the dynamic response of solids and structures. The ParaDyn program has been used as a production tool for over three years for analyzing problems which range in size from a few tens of thousands of elements to between one-million and ten-million elements. ParaDyn runs on parallel computers provided by the Department of Energy Accelerated Strategic Computing Initiative (ASCI) and the Department of Defense High Performance Computing and Modernization Program. Preprocessing and post-processing software utilities and tools are designed to facilitate the generation of partitioned domains for processors on a massively parallel computer and the visualization of both resultant data and boundary data generated in a parallel simulation. This manual provides a brief overview of the parallel implementation; describes techniques for running the ParaDyn program, tools and utilities; and provides examples of parallel simulations.

  15. HOPSPACK: Hybrid Optimization Parallel Search Package.

    SciTech Connect (OSTI)

    Gray, Genetha A.; Kolda, Tamara G.; Griffin, Joshua; Taddy, Matt; Martinez-Canales, Monica

    2008-12-01

    In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4

  16. Large N phase transitions in massive N = 2 gauge theories

    SciTech Connect (OSTI)

    Russo, J. G.

    2014-07-23

    Using exact results obtained from localization on S{sup 4}, we explore the large N limit of N = 2 super Yang-Mills theories with massive matter multiplets. In this talk we discuss two cases: N = 2* theory, describing a massive hypermultiplet in the adjoint representation, and super QCD with massive quarks. When the radius of the four-sphere is sent to infinity these theories are described by solvable matrix models, which exhibit a number of interesting phenomena including quantum phase transitions at finite 't Hooft coupling.

  17. SEGUE 2: THE LEAST MASSIVE GALAXY

    SciTech Connect (OSTI)

    Kirby, Evan N.; Boylan-Kolchin, Michael; Bullock, James S.; Kaplinghat, Manoj; Cohen, Judith G.; Geha, Marla

    2013-06-10

    Segue 2, discovered by Belokurov et al., is a galaxy with a luminosity of only 900 L{sub Sun }. We present Keck/DEIMOS spectroscopy of 25 members of Segue 2-a threefold increase in spectroscopic sample size. The velocity dispersion is too small to be measured with our data. The upper limit with 90% (95%) confidence is {sigma}{sub v} < 2.2 (2.6) km s{sup -1}, the most stringent limit for any galaxy. The corresponding limit on the mass within the three-dimensional half-light radius (46 pc) is M{sub 1/2} < 1.5 (2.1) Multiplication-Sign 10{sup 5} M{sub Sun }. Segue 2 is the least massive galaxy known. We identify Segue 2 as a galaxy rather than a star cluster based on the wide dispersion in [Fe/H] (from -2.85 to -1.33) among the member stars. The stars' [{alpha}/Fe] ratios decline with increasing [Fe/H], indicating that Segue 2 retained Type Ia supernova ejecta despite its presently small mass and that star formation lasted for at least 100 Myr. The mean metallicity, ([Fe/H]) = -2.22 {+-} 0.13 (about the same as the Ursa Minor galaxy, 330 times more luminous than Segue 2), is higher than expected from the luminosity-metallicity relation defined by more luminous dwarf galaxy satellites of the Milky Way. Segue 2 may be the barest remnant of a tidally stripped, Ursa Minor-sized galaxy. If so, it is the best example of an ultra-faint dwarf galaxy that came to be ultra-faint through tidal stripping. Alternatively, Segue 2 could have been born in a very low mass dark matter subhalo (v{sub max} < 10 km s{sup -1}), below the atomic hydrogen cooling limit.

  18. Workers Pour 1 Million Gallons of Grout into Massive Tanks

    Broader source: Energy.gov [DOE]

    AIKEN, S.C. Workers have poured more than 1 million gallons of a cement-like grout into two underground radioactive waste tanks, moving the Savannah River Site (SRS) nearer to closing the massive structures.

  19. Secretary Chu Announces New Institute to Help Scientists Improve Massive

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    Data Set Research on DOE Supercomputers | Department of Energy Institute to Help Scientists Improve Massive Data Set Research on DOE Supercomputers Secretary Chu Announces New Institute to Help Scientists Improve Massive Data Set Research on DOE Supercomputers March 29, 2012 - 2:48pm Addthis Washington D.C. - Energy Secretary Steven Chu today announced $5 million to establish the Scalable Data Management, Analysis and Visualization (SDAV) Institute as part of the Obama Administration's

  20. Model for Thermal Relic Dark Matter of Strongly Interacting Massive

    Office of Scientific and Technical Information (OSTI)

    Particles (Journal Article) | SciTech Connect Model for Thermal Relic Dark Matter of Strongly Interacting Massive Particles Citation Details In-Document Search This content will become publicly available on July 9, 2016 Title: Model for Thermal Relic Dark Matter of Strongly Interacting Massive Particles Authors: Hochberg, Yonit ; Kuflik, Eric ; Murayama, Hitoshi ; Volansky, Tomer ; Wacker, Jay G. Publication Date: 2015-07-10 OSTI Identifier: 1193520 Grant/Contract Number: AC02-05CH11231

  1. DARK MATTER HALO PROFILES OF MASSIVE CLUSTERS: THEORY VERSUS OBSERVATIONS

    Office of Scientific and Technical Information (OSTI)

    (Journal Article) | SciTech Connect DARK MATTER HALO PROFILES OF MASSIVE CLUSTERS: THEORY VERSUS OBSERVATIONS Citation Details In-Document Search Title: DARK MATTER HALO PROFILES OF MASSIVE CLUSTERS: THEORY VERSUS OBSERVATIONS Dark-matter-dominated cluster-scale halos act as an important cosmological probe and provide a key testing ground for structure formation theory. Focusing on their mass profiles, we have carried out (gravity-only) simulations of the concordance {Lambda}CDM cosmology,

  2. Massive Cement Pour into Hanford Site Nuclear Facility Underway: Recovery

    Office of Environmental Management (EM)

    Act Funding Puts U Canyon in Home Stretch of Demolition Preparations | Department of Energy Massive Cement Pour into Hanford Site Nuclear Facility Underway: Recovery Act Funding Puts U Canyon in Home Stretch of Demolition Preparations Massive Cement Pour into Hanford Site Nuclear Facility Underway: Recovery Act Funding Puts U Canyon in Home Stretch of Demolition Preparations June 14, 2011 - 12:00pm Addthis Media Contacts Andre Armstrong, CH2M HILL Plateau Remediation Company (509) 376-6773

  3. Massively Multi-core Acceleration of a Document-Similarity Classifier to Detect Web Attacks

    SciTech Connect (OSTI)

    Ulmer, C; Gokhale, M; Top, P; Gallagher, B; Eliassi-Rad, T

    2010-01-14

    This paper describes our approach to adapting a text document similarity classifier based on the Term Frequency Inverse Document Frequency (TFIDF) metric to two massively multi-core hardware platforms. The TFIDF classifier is used to detect web attacks in HTTP data. In our parallel hardware approaches, we design streaming, real time classifiers by simplifying the sequential algorithm and manipulating the classifier's model to allow decision information to be represented compactly. Parallel implementations on the Tilera 64-core System on Chip and the Xilinx Virtex 5-LX FPGA are presented. For the Tilera, we employ a reduced state machine to recognize dictionary terms without requiring explicit tokenization, and achieve throughput of 37MB/s at slightly reduced accuracy. For the FPGA, we have developed a set of software tools to help automate the process of converting training data to synthesizable hardware and to provide a means of trading off between accuracy and resource utilization. The Xilinx Virtex 5-LX implementation requires 0.2% of the memory used by the original algorithm. At 166MB/s (80X the software) the hardware implementation is able to achieve Gigabit network throughput at the same accuracy as the original algorithm.

  4. Chassis Dynamometer Testing of Parallel and Series Diesel Hybrid...

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    Chassis Dynamometer Testing of Parallel and Series Diesel Hybrid Buses Chassis Dynamometer Testing of Parallel and Series Diesel Hybrid Buses Emissions and fuel economy data were ...

  5. Parallel Botulinum Neurotoxin/A Immuno- and Enzyme Activity Assays...

    Office of Scientific and Technical Information (OSTI)

    Conference: Parallel Botulinum NeurotoxinA Immuno- and Enzyme Activity Assays Using the Versatile RapiDx Platform. Citation Details In-Document Search Title: Parallel Botulinum ...

  6. Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications...

    Office of Scientific and Technical Information (OSTI)

    Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications Citation Details In-Document Search Title: Linux Kernel Co-Scheduling For Bulk Synchronous Parallel ...

  7. The structural simulation toolkit :a tool for exploring parallel...

    Office of Scientific and Technical Information (OSTI)

    for exploring parallel architectures and applications. Citation Details In-Document Search Title: The structural simulation toolkit :a tool for exploring parallel architectures ...

  8. The Swift Parallel Scripting Language for ALCF Systems | Argonne...

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    bgclang Compiler Cobalt Scheduler GLEAN Petrel Swift The Swift Parallel Scripting Language for ALCF Systems Swift is an implicitly parallel functional language that makes it...

  9. Parallel Harness for Informatic Stream Hashing

    Energy Science and Technology Software Center (OSTI)

    2012-09-11

    PHISH is a lightweight framework which a set of independent processes can use to exchange data as they run on the same desktop machine, on processors of a parallel machine, or on different machines across a network. This enables them to work in a coordinated parallel fashion to perform computations on either streaming, archived, or self-generated data. The PHISH distribution includes a simple, portable library for performing data exchanges in useful patterns either via MPImore » message-passing or ZMQ sockets. PHISH input scripts are used to describe a data-processing algorithm, and additional tools provided in the PHISH distribution convert the script into a form that can be launched as a parallel job.« less

  10. Parallel Implementation of Power System Dynamic Simulation

    SciTech Connect (OSTI)

    Jin, Shuangshuang; Huang, Zhenyu; Diao, Ruisheng; Wu, Di; Chen, Yousu

    2013-07-21

    Dynamic simulation of power system transient stability is important for planning, monitoring, operation, and control of electrical power systems. However, modeling the system dynamics and network involves the computationally intensive time-domain solution of numerous differential and algebraic equations (DAE). This results in a transient stability implementation that may not maintain the real-time constraints of an online security assessment. This paper presents a parallel implementation of the dynamic simulation on a high-performance computing (HPC) platform using parallel simulation algorithms and computation architectures. It enables the simulation to run even faster than real time, enabling the look-ahead capability of upcoming stability problems in the power grid.

  11. Berkeley Unified Parallel C (UPC) Compiler

    Energy Science and Technology Software Center (OSTI)

    2003-04-06

    This program is a portable, open-source, compiler for the UPC language, which is based on the Open64 framework, and has extensive support for optimizations. This compiler operated by translating UPC into ANS/ISO C for compilation by a native compiler and linking with a UPC Runtime Library. This design eases portability to both shared and distributed memory parallel architectures. For proper operation the "Berkeley Unified Parallel C (UPC) Runtime Library" and its dependencies are required. Compatiblemore » replacements which implement "The Berkeley UPC Runtime Specification" are possible.« less

  12. Xyce parallel electronic simulator release notes.

    SciTech Connect (OSTI)

    Keiter, Eric Richard; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.

  13. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  14. Parallel Performance of a Combustion Chemistry Simulation

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Skinner, Gregg; Eigenmann, Rudolf

    1995-01-01

    We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.

  15. Linked-View Parallel Coordinate Plot Renderer

    Energy Science and Technology Software Center (OSTI)

    2011-06-28

    This software allows multiple linked views for interactive querying via map-based data selection, bar chart analytic overlays, and high dynamic range (HDR) line renderings. The major component of the visualization package is a parallel coordinate renderer with binning, curved layouts, shader-based rendering, and other techniques to allow interactive visualization of multidimensional data.

  16. Communication Graph Generator for Parallel Programs

    Energy Science and Technology Software Center (OSTI)

    2014-04-08

    Graphator is a collection of relatively simple sequential programs that generate communication graphs/matrices for commonly occurring patterns in parallel programs. Currently, there is support for five communication patterns: two-dimensional 4-point stencil, four-dimensional 8-point stencil, all-to-alls over sub-communicators, random near-neighbor communication, and near-neighbor communication.

  17. Parallel programming with PCN. Revision 2

    SciTech Connect (OSTI)

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  18. Message passing with parallel queue traversal

    DOE Patents [OSTI]

    Underwood, Keith D.; Brightwell, Ronald B.; Hemmert, K. Scott

    2012-05-01

    In message passing implementations, associative matching structures are used to permit list entries to be searched in parallel fashion, thereby avoiding the delay of linear list traversal. List management capabilities are provided to support list entry turnover semantics and priority ordering semantics.

  19. Requirements for Parallel I/O,

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    efficient, m anycore a rchitectures - Parallel I O: h ardwaresonware s tack i s i n fl ux. D on't h ave sufficient f unding a t t he m oment - Analysis: i denEfy B ig D ata m...

  20. Cosmological stability bound in massive gravity and bigravity

    SciTech Connect (OSTI)

    Fasiello, Matteo; Tolley, Andrew J. E-mail: andrew.j.tolley@case.edu

    2013-12-01

    We give a simple derivation of a cosmological bound on the graviton mass for spatially flat FRW solutions in massive gravity with an FRW reference metric and for bigravity theories. This bound comes from the requirement that the kinetic term of the helicity zero mode of the graviton is positive definite. The bound is dependent only on the parameters in the massive gravity potential and the Hubble expansion rate for the two metrics. We derive the decoupling limit of bigravity and FRW massive gravity, and use this to give an independent derivation of the cosmological bound. We recover our previous results that the tension between satisfying the Friedmann equation and the cosmological bound is sufficient to rule out all observationally relevant FRW solutions for massive gravity with an FRW reference metric. In contrast, in bigravity this tension is resolved due to different nature of the Vainshtein mechanism. We find that in bigravity theories there exists an FRW solution with late-time self-acceleration for which the kinetic terms for the helicity-2, helicity-1 and helicity-0 are generically nonzero and positive making this a compelling candidate for a model of cosmic acceleration. We confirm that the generalized bound is saturated for the candidate partially massless (bi)gravity theories but the existence of helicity-1/helicity-0 interactions implies the absence of the conjectured partially massless symmetry for both massive gravity and bigravity.

  1. Spontaneous Lorentz and diffeomorphism violation, massive modes, and gravity

    SciTech Connect (OSTI)

    Bluhm, Robert; Fung Shuhong; Kostelecky, V. Alan

    2008-03-15

    Theories with spontaneous local Lorentz and diffeomorphism violation contain massless Nambu-Goldstone modes, which arise as field excitations in the minimum of the symmetry-breaking potential. If the shape of the potential also allows excitations above the minimum, then an alternative gravitational Higgs mechanism can occur in which massive modes involving the metric appear. The origin and basic properties of the massive modes are addressed in the general context involving an arbitrary tensor vacuum value. Special attention is given to the case of bumblebee models, which are gravitationally coupled vector theories with spontaneous local Lorentz and diffeomorphism violation. Mode expansions are presented in both local and spacetime frames, revealing the Nambu-Goldstone and massive modes via decomposition of the metric and bumblebee fields, and the associated symmetry properties and gauge fixing are discussed. The class of bumblebee models with kinetic terms of the Maxwell form is used as a focus for more detailed study. The nature of the associated conservation laws and the interpretation as a candidate alternative to Einstein-Maxwell theory are investigated. Explicit examples involving smooth and Lagrange-multiplier potentials are studied to illustrate features of the massive modes, including their origin, nature, dispersion laws, and effects on gravitational interactions. In the weak static limit, the massive mode and Lagrange-multiplier fields are found to modify the Newton and Coulomb potentials. The nature and implications of these modifications are examined.

  2. Translation invariant time-dependent solutions to massive gravity

    SciTech Connect (OSTI)

    Mourad, J.; Steer, D.A. E-mail: steer@apc.univ-paris7.fr

    2013-12-01

    Homogeneous time-dependent solutions of massive gravity generalise the plane wave solutions of the linearised Fierz-Pauli equations for a massive spin-two particle, as well as the Kasner solutions of General Relativity. We show that they also allow a clear counting of the degrees of freedom and represent a simplified framework to work out the constraints, the equations of motion and the initial value formulation. We work in the vielbein formulation of massive gravity, find the phase space resulting from the constraints and show that several disconnected sectors of solutions exist some of which are unstable. The initial values determine the sector to which a solution belongs. Classically, the theory is not pathological but quantum mechanically the theory may suffer from instabilities. The latter are not due to an extra ghost-like degree of freedom.

  3. Massive gravitational waves in Chern-Simons modified gravity

    SciTech Connect (OSTI)

    Myung, Yun Soo; Moon, Taeyoon E-mail: tymoon@inje.ac.kr

    2014-10-01

    We consider the nondynamical Chern-Simons (nCS) modified gravity, which is regarded as a parity-odd theory of massive gravity in four dimensions. We first find polarization modes of gravitational waves for θ=x/μ in nCS modified gravity by using the Newman-Penrose formalism where the null complex tetrad is necessary to specify gravitational waves. We show that in the Newman–Penrose formalism, the number of polarization modes is one in addition to an unspecified Ψ{sub 4}, implying three degrees of freedom for θ=x/μ. This compares with two for a canonical embedding of θ=t/μ. Also, if one introduces the Ricci tensor formalism to describe a massive graviton arising from the nCS modified gravity, one finds one massive mode after making second-order wave equations, which is compared to five found from the parity-even Einstein–Weyl gravity.

  4. Massive Stars in Colliding Wind Systems: the GLAST Perspective

    SciTech Connect (OSTI)

    Reimer, Anita; Reimer, Olaf; /Stanford U., HEPL /KIPAC, Menlo Park

    2011-11-29

    Colliding winds of massive stars in binary systems are considered as candidate sites of high-energy non-thermal photon emission. They are already among the suggested counterparts for a few individual unidentified EGRET sources, but may constitute a detectable source population for the GLAST observatory. The present work investigates such population study of massive colliding wind systems at high-energy gamma-rays. Based on the recent detailed model (Reimer et al. 2006) for non-thermal photon production in prime candidate systems, we unveil the expected characteristics of this source class in the observables accessible at LAT energies. Combining the broadband emission model with the presently cataloged distribution of such systems and their individual parameters allows us to conclude on the expected maximum number of LAT-detections among massive stars in colliding wind binary systems.

  5. Hanford Waste Treatment Plant Sets Massive Protective Shield door in

    Office of Environmental Management (EM)

    Pretreatment Facility | Department of Energy Waste Treatment Plant Sets Massive Protective Shield door in Pretreatment Facility Hanford Waste Treatment Plant Sets Massive Protective Shield door in Pretreatment Facility January 12, 2011 - 12:00pm Addthis The carbon steel doors come together to form an upside-down L-shape. The 102-ton door was set on top of the 85-ton door that was installed at the end of December. The carbon steel doors come together to form an upside-down L-shape. The

  6. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Data communications in a parallel active messaging interface ('PAMI') or a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution of a compute node, including specification of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications instruction, the instruction characterized by instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance witht the instruction type, the transfer data from the origin endpoin to the target endpoint.

  7. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-29

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.

  8. Locating hardware faults in a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

    2010-04-13

    Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.

  9. Parallel machine architecture for production rule systems

    DOE Patents [OSTI]

    Allen, Jr., John D.; Butler, Philip L.

    1989-01-01

    A parallel processing system for production rule programs utilizes a host processor for storing production rule right hand sides (RHS) and a plurality of rule processors for storing left hand sides (LHS). The rule processors operate in parallel in the recognize phase of the system recognize -Act Cycle to match their respective LHS's against a stored list of working memory elements (WME) in order to find a self consistent set of WME's. The list of WME is dynamically varied during the Act phase of the system in which the host executes or fires rule RHS's for those rules for which a self-consistent set has been found by the rule processors. The host transmits instructions for creating or deleting working memory elements as dictated by the rule firings until the rule processors are unable to find any further self-consistent working memory element sets at which time the production rule system is halted.

  10. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    SciTech Connect (OSTI)

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  11. Parallel Molecular Dynamics Program for Molecules

    Energy Science and Technology Software Center (OSTI)

    1995-03-07

    ParBond is a parallel classical molecular dynamics code that models bonded molecular systems, typically of an organic nature. It uses classical force fields for both non-bonded Coulombic and Van der Waals interactions and for 2-, 3-, and 4-body bonded (bond, angle, dihedral, and improper) interactions. It integrates Newton''s equation of motion for the molecular system and evaluates various thermodynamical properties of the system as it progresses.

  12. Runtime System Library for Parallel Weather Modules

    Energy Science and Technology Software Center (OSTI)

    1997-07-22

    RSL is a Fortran-callable runtime library for use in implementing regular-grid weather forecast models, with nesting, on scalable distributed memory parallel computers. It provides high-level routines for finite-difference stencil communications and inter-domain exchange of data for nested forcing and feedback. RSL supports a unique point-wise domain-decomposition strategy to facilitate load-balancing.

  13. Parallel Heuristics for Scalable Community Detection

    SciTech Connect (OSTI)

    Lu, Howard; Kalyanaraman, Anantharaman; Halappanavar, Mahantesh; Choudhury, Sutanay

    2014-05-17

    Community detection has become a fundamental operation in numerous graph-theoretic applications. It is used to reveal natural divisions that exist within real world networks without imposing prior size or cardinality constraints on the set of communities. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed by Blondel et al. in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method is also inherently sequential, thereby limiting its scalability to problems that can be solved on desktops. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose multiple heuristics that are designed to break the sequential barrier. Our heuristics are agnostic to the underlying parallel architecture. For evaluation purposes, we implemented our heuristics on shared memory (OpenMP) and distributed memory (MapReduce-MPI) machines, and tested them over real world graphs derived from multiple application domains (internet, biological, natural language processing). Experimental results demonstrate the ability of our heuristics to converge to high modularity solutions comparable to those output by the serial algorithm in nearly the same number of iterations, while also drastically reducing time to solution.

  14. Parallel Integrated Thermal Management - Energy Innovation Portal

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Vehicles and Fuels Vehicles and Fuels Early Stage R&D Early Stage R&D Find More Like This Return to Search Parallel Integrated Thermal Management National Renewable Energy Laboratory Contact NREL About This Technology Technology Marketing Summary Many current cooling systems for hybrid electric vehicles (HEVs) with a high power electric drive system utilize a low temperature liquid cooling loop for cooling the power electronics system and electric machines associated with the electric

  15. FORTRAN Extensions for Modular Parallel Processing

    Energy Science and Technology Software Center (OSTI)

    1996-01-12

    FORTRAN M is a small set of extensions to FORTRAN that supports a modular approach to the construction of sequential and parallel programs. FORTRAN M programs use channels to plug together processes which may be written in FORTRAN M or FORTRAN 77. Processes communicate by sending and receiving messages on channels. Channels and processes can be created dynamically, but programs remain deterministic unless specialized nondeterministic constructs are used.

  16. FETI Prime Domain Decomposition base Parallel Iterative Solver Library Ver.1.0

    Energy Science and Technology Software Center (OSTI)

    2003-09-15

    FETI Prime is a library for the iterative solution of linear equations in solid and structural mechanics. The algorithm employs preconditioned conjugate gradients, with a domain decomposition-based preconditioner. The software is written in C++ and is designed for use with massively parallel computers, using MPI. The algorithm is based on the FETI-DP method, with additional capabilities for handling constraint equations, as well as interfacing with the Salinas structural dynamics code and the Finite Element Interfacemore » (FEI) library. Practical Application: FETI Prime is designed for use with finite element-based simulation codes for solid and structural mechanics. The solver uses element matrices, connectivity information, nodal information, and force vectors computed by the host code and provides back the solution to the linear system of equations, to the user specified level of accuracy, The library is compiled with the host code and becomes an integral part of the host code executable.« less

  17. Xyce(™) Parallel Electronic Simulator

    Energy Science and Technology Software Center (OSTI)

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuitmore » network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.« less

  18. Xyce parallel electronic simulator : reference guide.

    SciTech Connect (OSTI)

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.

  19. A Pervasive Parallel Processing Framework For Data Visualization And Analysis At Extreme Scale Final Scientific and Technical Report

    SciTech Connect (OSTI)

    Geveci, Berk

    2014-10-31

    The evolution of the computing world from teraflop to petaflop has been relatively effortless,with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipeline model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.

  20. Perturbation Theory of Massive Yang-Mills Fields

    DOE R&D Accomplishments [OSTI]

    Veltman, M.

    1968-08-01

    Perturbation theory of massive Yang-Mills fields is investigated with the help of the Bell-Treiman transformation. Diagrams containing one closed loop are shown to be convergent if there are more than four external vector boson lines. The investigation presented does not exclude the possibility that the theory is renormalizable.

  1. Characterizing the parallelism in rule-based expert systems

    SciTech Connect (OSTI)

    Douglass, R.J.

    1984-01-01

    A brief review of two classes of rule-based expert systems is presented, followed by a detailed analysis of potential sources of parallelism at the production or rule level, the subrule level (including match, select, and act parallelism), and at the search level (including AND, OR, and stream parallelism). The potential amount of parallelism from each source is discussed and characterized in terms of its granularity, inherent serial constraints, efficiency, speedup, dynamic behavior, and communication volume, frequency, and topology. Subrule parallelism will yield, at best, two- to tenfold speedup, and rule level parallelism will yield a modest speedup on the order of 5 to 10 times. Rule level can be combined with OR, AND, and stream parallelism in many instances to yield further parallel speedups.

  2. Parallelizing AT with MatlabMPI

    SciTech Connect (OSTI)

    Li, Evan Y.; /Brown U. /SLAC

    2011-06-22

    The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while expending minimal communication overhead with the integrated MATLAB library.

  3. A brief parallel I/O tutorial.

    SciTech Connect (OSTI)

    Ward, H. Lee

    2010-03-01

    This document provides common best practices for the efficient utilization of parallel file systems for analysts and application developers. A multi-program, parallel supercomputer is able to provide effective compute power by aggregating a host of lower-power processors using a network. The idea, in general, is that one either constructs the application to distribute parts to the different nodes and processors available and then collects the result (a parallel application), or one launches a large number of small jobs, each doing similar work on different subsets (a campaign). The I/O system on these machines is usually implemented as a tightly-coupled, parallel application itself. It is providing the concept of a 'file' to the host applications. The 'file' is an addressable store of bytes and that address space is global in nature. In essence, it is providing a global address space. Beyond the simple reality that the I/O system is normally composed of a small, less capable, collection of hardware, that concept of a global address space will cause problems if not very carefully utilized. How much of a problem and the ways in which those problems manifest will be different, but that it is problem prone has been well established. Worse, the file system is a shared resource on the machine - a system service. What an application does when it uses the file system impacts all users. It is not the case that some portion of the available resource is reserved. Instead, the I/O system responds to requests by scheduling and queuing based on instantaneous demand. Using the system well contributes to the overall throughput on the machine. From a solely self-centered perspective, using it well reduces the time that the application or campaign is subject to impact by others. The developer's goal should be to accomplish I/O in a way that minimizes interaction with the I/O system, maximizes the amount of data moved per call, and provides the I/O system the most information about the I/O transfer per request.

  4. Parallel State Estimation Assessment with Practical Data

    SciTech Connect (OSTI)

    Chen, Yousu; Jin, Shuangshuang; Rice, Mark J.; Huang, Zhenyu

    2014-10-31

    This paper presents a full-cycle parallel state estimation (PSE) implementation using a preconditioned conjugate gradient algorithm. The developed code is able to solve large-size power system state estimation within 5 seconds using real-world data, comparable to the Supervisory Control And Data Acquisition (SCADA) rate. This achievement allows the operators to know the system status much faster to help improve grid reliability. Case study results of the Bonneville Power Administration (BPA) system with real measurements are presented. The benefits of fast state estimation are also discussed.

  5. SPRNG Scalable Parallel Random Number Generator LIbrary

    Energy Science and Technology Software Center (OSTI)

    2010-03-16

    This revision corrects some errors in SPRNG 1. Users of newer SPRNG versions can obtain the corrected files and build their version with it. This version also improves the scalability of some of the application-based tests in the SPRNG test suite. It also includes an interface to a parallel Mersenne Twister, so that if users install the Mersenne Twister, then they can test this generator with the SPRNG test suite and also use some SPRNGmore » features with that generator.« less

  6. Carbothermic reduction with parallel heat sources

    DOE Patents [OSTI]

    Troup, Robert L. (Murrysville, PA); Stevenson, David T. (Washington Township, Washington County, PA)

    1984-12-04

    Disclosed are apparatus and method of carbothermic direct reduction for producing an aluminum alloy from a raw material mix including aluminum oxide, silicon oxide, and carbon wherein parallel heat sources are provided by a combustion heat source and by an electrical heat source at essentially the same position in the reactor, e.g., such as at the same horizontal level in the path of a gravity-fed moving bed in a vertical reactor. The present invention includes providing at least 79% of the heat energy required in the process by the electrical heat source.

  7. Parallel heater system for subsurface formations

    DOE Patents [OSTI]

    Harris, Christopher Kelvin (Houston, TX); Karanikas, John Michael (Houston, TX); Nguyen, Scott Vinh (Houston, TX)

    2011-10-25

    A heating system for a subsurface formation is disclosed. The system includes a plurality of substantially horizontally oriented or inclined heater sections located in a hydrocarbon containing layer in the formation. At least a portion of two of the heater sections are substantially parallel to each other. The ends of at least two of the heater sections in the layer are electrically coupled to a substantially horizontal, or inclined, electrical conductor oriented substantially perpendicular to the ends of the at least two heater sections.

  8. SimFS: A Large Scale Parallel File System Simulator

    Energy Science and Technology Software Center (OSTI)

    2011-08-30

    The software provides both framework and tools to simulate a large-scale parallel file system such as Lustre.

  9. Idaho Site D&D Crew Uses Specialized Tools to Cut Apart Massive...

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    D&D Crew Uses Specialized Tools to Cut Apart Massive Tank in Demolition Project Idaho Site D&D Crew Uses Specialized Tools to Cut Apart Massive Tank in Demolition Project November...

  10. Switch for serial or parallel communication networks

    DOE Patents [OSTI]

    Crosette, Dario B. (DeSoto, TX)

    1994-01-01

    A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination.

  11. Switch for serial or parallel communication networks

    DOE Patents [OSTI]

    Crosette, D.B.

    1994-07-19

    A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination. 9 figs.

  12. Translation invariant time-dependent solutions to massive gravity II

    SciTech Connect (OSTI)

    Mourad, J.; Steer, D.A. E-mail: steer@apc.univ-paris7.fr

    2014-06-01

    This paper is a sequel to JCAP 12 (2013) 004 and is also devoted to translation-invariant solutions of ghost-free massive gravity in its moving frame formulation. Here we consider a mass term which is linear in the vielbein (corresponding to a ?{sub 3} term in the 4D metric formulation) in addition to the cosmological constant. We determine explicitly the constraints, and from the initial value formulation show that the time-dependent solutions can have singularities at a finite time. Although the constraints give, as in the ?{sub 1} case, the correct number of degrees of freedom for a massive spin two field, we show that the lapse function can change sign at a finite time causing a singular time evolution. This is very different to the ?{sub 1} case where time evolution is always well defined. We conclude that the ?{sub 3} mass term can be pathological and should be treated with care.

  13. INTERNAL GRAVITY WAVES IN MASSIVE STARS: ANGULAR MOMENTUM TRANSPORT

    SciTech Connect (OSTI)

    Rogers, T. M.; Lin, D. N. C.; McElwaine, J. N.; Lau, H. H. B. E-mail: lin@ucolick.org E-mail: hblau@astro.uni-bonn.de

    2013-07-20

    We present numerical simulations of internal gravity waves (IGW) in a star with a convective core and extended radiative envelope. We report on amplitudes, spectra, dissipation, and consequent angular momentum transport by such waves. We find that these waves are generated efficiently and transport angular momentum on short timescales over large distances. We show that, as in Earth's atmosphere, IGW drive equatorial flows which change magnitude and direction on short timescales. These results have profound consequences for the observational inferences of massive stars, as well as their long term angular momentum evolution. We suggest IGW angular momentum transport may explain many observational mysteries, such as: the misalignment of hot Jupiters around hot stars, the Be class of stars, Ni enrichment anomalies in massive stars, and the non-synchronous orbits of interacting binaries.

  14. The halo model in a massive neutrino cosmology

    SciTech Connect (OSTI)

    Massara, Elena; Villaescusa-Navarro, Francisco; Viel, Matteo E-mail: villaescusa@oats.inaf.it

    2014-12-01

    We provide a quantitative analysis of the halo model in the context of massive neutrino cosmologies. We discuss all the ingredients necessary to model the non-linear matter and cold dark matter power spectra and compare with the results of N-body simulations that incorporate massive neutrinos. Our neutrino halo model is able to capture the non-linear behavior of matter clustering with a ?20% accuracy up to very non-linear scales of k=10 h/Mpc (which would be affected by baryon physics). The largest discrepancies arise in the range k=0.51 h/Mpc where the 1-halo and 2-halo terms are comparable and are present also in a massless neutrino cosmology. However, at scales k<0.2 h/Mpc our neutrino halo model agrees with the results of N-body simulations at the level of 8% for total neutrino masses of <0.3 eV. We also model the neutrino non-linear density field as a sum of a linear and clustered component and predict the neutrino power spectrum and the cold dark matter-neutrino cross-power spectrum up to k=1 h/Mpc with ?30% accuracy. For masses below 0.15 eV the neutrino halo model captures the neutrino induced suppression, casted in terms of matter power ratios between massive and massless scenarios, with a 2% agreement with the results of N-body/neutrino simulations. Finally, we provide a simple application of the halo model: the computation of the clustering of galaxies, in massless and massive neutrinos cosmologies, using a simple Halo Occupation Distribution scheme and our halo model extension.

  15. THE ROLE OF THE MAGNETOROTATIONAL INSTABILITY IN MASSIVE STARS

    SciTech Connect (OSTI)

    Wheeler, J. Craig; Kagan, Daniel; Chatzopoulos, Emmanouil

    2015-01-20

    The magnetorotational instability (MRI) is key to physics in accretion disks and is widely considered to play some role in massive star core collapse. Models of rotating massive stars naturally develop very strong shear at composition boundaries, a necessary condition for MRI instability, and the MRI is subject to triply diffusive destabilizing effects in radiative regions. We have used the MESA stellar evolution code to compute magnetic effects due to the Spruit-Tayler (ST) mechanism and the MRI, separately and together, in a sample of massive star models. We find that the MRI can be active in the later stages of massive star evolution, leading to mixing effects that are not captured in models that neglect the MRI. The MRI and related magnetorotational effects can move models of given zero-age main sequence mass across ''boundaries'' from degenerate CO cores to degenerate O/Ne/Mg cores and from degenerate O/Ne/Mg cores to iron cores, thus affecting the final evolution and the physics of core collapse. The MRI acting alone can slow the rotation of the inner core in general agreement with the observed ''initial'' rotation rates of pulsars. The MRI analysis suggests that localized fields ?10{sup 12} G may exist at the boundary of the iron core. With both the ST and MRI mechanisms active in the 20 M {sub ?} model, we find that the helium shell mixes entirely out into the envelope. Enhanced mixing could yield a population of yellow or even blue supergiant supernova progenitors that would not be standard SN IIP.

  16. Weight Loss Regime for Massive Low Temperature Electrons | The Ames

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Laboratory Weight Loss Regime for Massive Low Temperature Electrons A compound made out of ytterbium (Yb), platinum (Pt), and bismuth (Bi) offers researchers the opportunity to watch the birth of magnetic behavior by applying small changes in magnetic field or temperature. Despite the electrons having effective masses of nearly 10,000 times their normal mass when YbPtBi becomes magnetic, researchers have been able to monitor its quantum oscillations, key for determining important electronic

  17. A symmetric approach to the massive nonlinear sigma model

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Ferrari, Ruggero

    2011-09-28

    In the present study we extend to the massive case the procedure of divergences subtraction, previously introduced for the massless nonlinear sigma model (D = 4). Perturbative expansion in the number of loops is successfully constructed. The resulting theory depends on the Spontaneous Symmetry Breaking parameter v, on the mass m and on the radiative correction parameter Λ. Fermions are not considered in the present work. SU(2) Ⓧ SU(2) is the group used.

  18. Hybrid and Parallel Domain-Decomposition Methods Development to Enable Monte Carlo for Reactor Analyses

    SciTech Connect (OSTI)

    Wagner, John C; Mosher, Scott W; Evans, Thomas M; Peplow, Douglas E.; Turner, John A

    2011-01-01

    This paper describes code and methods development at the Oak Ridge National Laboratory focused on enabling high-fidelity, large-scale reactor analyses with Monte Carlo (MC). Current state-of-the-art tools and methods used to perform real commercial reactor analyses have several undesirable features, the most significant of which is the non-rigorous spatial decomposition scheme. Monte Carlo methods, which allow detailed and accurate modeling of the full geometry and are considered the gold standard for radiation transport solutions, are playing an ever-increasing role in correcting and/or verifying the deterministic, multi-level spatial decomposition methodology in current practice. However, the prohibitive computational requirements associated with obtaining fully converged, system-wide solutions restrict the role of MC to benchmarking deterministic results at a limited number of state-points for a limited number of relevant quantities. The goal of this research is to change this paradigm by enabling direct use of MC for full-core reactor analyses. The most significant of the many technical challenges that must be overcome are the slow, non-uniform convergence of system-wide MC estimates and the memory requirements associated with detailed solutions throughout a reactor (problems involving hundreds of millions of different material and tally regions due to fuel irradiation, temperature distributions, and the needs associated with multi-physics code coupling). To address these challenges, our research has focused on the development and implementation of (1) a novel hybrid deterministic/MC method for determining high-precision fluxes throughout the problem space in k-eigenvalue problems and (2) an efficient MC domain-decomposition (DD) algorithm that partitions the problem phase space onto multiple processors for massively parallel systems, with statistical uncertainty estimation. The hybrid method development is based on an extension of the FW-CADIS method, which attempts to achieve uniform statistical uncertainty throughout a designated problem space. The MC DD development is being implemented in conjunction with the Denovo deterministic radiation transport package to have direct access to the 3-D, massively parallel discrete-ordinates solver (to support the hybrid method) and the associated parallel routines and structure. This paper describes the hybrid method, its implementation, and initial testing results for a realistic 2-D quarter core pressurized-water reactor model and also describes the MC DD algorithm and its implementation.

  19. Hybrid and Parallel Domain-Decomposition Methods Development to Enable Monte Carlo for Reactor Analyses

    SciTech Connect (OSTI)

    Wagner, John C; Mosher, Scott W; Evans, Thomas M; Peplow, Douglas E.; Turner, John A

    2010-01-01

    This paper describes code and methods development at the Oak Ridge National Laboratory focused on enabling high-fidelity, large-scale reactor analyses with Monte Carlo (MC). Current state-of-the-art tools and methods used to perform ''real'' commercial reactor analyses have several undesirable features, the most significant of which is the non-rigorous spatial decomposition scheme. Monte Carlo methods, which allow detailed and accurate modeling of the full geometry and are considered the ''gold standard'' for radiation transport solutions, are playing an ever-increasing role in correcting and/or verifying the deterministic, multi-level spatial decomposition methodology in current practice. However, the prohibitive computational requirements associated with obtaining fully converged, system-wide solutions restrict the role of MC to benchmarking deterministic results at a limited number of state-points for a limited number of relevant quantities. The goal of this research is to change this paradigm by enabling direct use of MC for full-core reactor analyses. The most significant of the many technical challenges that must be overcome are the slow, non-uniform convergence of system-wide MC estimates and the memory requirements associated with detailed solutions throughout a reactor (problems involving hundreds of millions of different material and tally regions due to fuel irradiation, temperature distributions, and the needs associated with multi-physics code coupling). To address these challenges, our research has focused on the development and implementation of (1) a novel hybrid deterministic/MC method for determining high-precision fluxes throughout the problem space in k-eigenvalue problems and (2) an efficient MC domain-decomposition (DD) algorithm that partitions the problem phase space onto multiple processors for massively parallel systems, with statistical uncertainty estimation. The hybrid method development is based on an extension of the FW-CADIS method, which attempts to achieve uniform statistical uncertainty throughout a designated problem space. The MC DD development is being implemented in conjunction with the Denovo deterministic radiation transport package to have direct access to the 3-D, massively parallel discrete-ordinates solver (to support the hybrid method) and the associated parallel routines and structure. This paper describes the hybrid method, its implementation, and initial testing results for a realistic 2-D quarter core pressurized-water reactor model and also describes the MC DD algorithm and its implementation.

  20. The evolutionary tracks of young massive star clusters

    SciTech Connect (OSTI)

    Pfalzner, S.; Steinhausen, M.; Vincke, K.; Menten, K.; Parmentier, G.

    2014-10-20

    Stars mostly form in groups consisting of a few dozen to several ten thousand members. For 30 years, theoretical models have provided a basic concept of how such star clusters form and develop: they originate from the gas and dust of collapsing molecular clouds. The conversion from gas to stars being incomplete, the leftover gas is expelled, leading to cluster expansion and stars becoming unbound. Observationally, a direct confirmation of this process has proved elusive, which is attributed to the diversity of the properties of forming clusters. Here we take into account that the true cluster masses and sizes are masked, initially by the surface density of the background and later by the still present unbound stars. Based on the recent observational finding that in a given star-forming region the star formation efficiency depends on the local density of the gas, we use an analytical approach combined with N-body simulations to reveal evolutionary tracks for young massive clusters covering the first 10 Myr. Just like the Hertzsprung-Russell diagram is a measure for the evolution of stars, these tracks provide equivalent information for clusters. Like stars, massive clusters form and develop faster than their lower-mass counterparts, explaining why so few massive cluster progenitors are found.

  1. Sub-Second Parallel State Estimation

    SciTech Connect (OSTI)

    Chen, Yousu; Rice, Mark J.; Glaesemann, Kurt R.; Wang, Shaobu; Huang, Zhenyu

    2014-10-31

    This report describes the performance of Pacific Northwest National Laboratory (PNNL) sub-second parallel state estimation (PSE) tool using the utility data from the Bonneville Power Administrative (BPA) and discusses the benefits of the fast computational speed for power system applications. The test data were provided by BPA. They are two-days worth of hourly snapshots that include power system data and measurement sets in a commercial tool format. These data are extracted out from the commercial tool box and fed into the PSE tool. With the help of advanced solvers, the PSE tool is able to solve each BPA hourly state estimation problem within one second, which is more than 10 times faster than todays commercial tool. This improved computational performance can help increase the reliability value of state estimation in many aspects: (1) the shorter the time required for execution of state estimation, the more time remains for operators to take appropriate actions, and/or to apply automatic or manual corrective control actions. This increases the chances of arresting or mitigating the impact of cascading failures; (2) the SE can be executed multiple times within time allowance. Therefore, the robustness of SE can be enhanced by repeating the execution of the SE with adaptive adjustments, including removing bad data and/or adjusting different initial conditions to compute a better estimate within the same time as a traditional state estimators single estimate. There are other benefits with the sub-second SE, such as that the PSE results can potentially be used in local and/or wide-area automatic corrective control actions that are currently dependent on raw measurements to minimize the impact of bad measurements, and provides opportunities to enhance the power grid reliability and efficiency. PSE also can enable other advanced tools that rely on SE outputs and could be used to further improve operators actions and automated controls to mitigate effects of severe events on the grid. The power grid continues to grow and the number of measurements is increasing at an accelerated rate due to the variety of smart grid devices being introduced. A parallel state estimation implementation will have better performance than traditional, sequential state estimation by utilizing the power of high performance computing (HPC). This increased performance positions parallel state estimators as valuable tools for operating the increasingly more complex power grid.

  2. Broadcasting a message in a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Faraj, Ahmad A

    2013-04-16

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer that includes: transmitting, by the logical root to all of the nodes directly connected to the logical root, a message; and for each node except the logical root: receiving the message; if that node is the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received; if that node received the message from a parent node and if that node is not a leaf node, then transmitting the message to all of the child nodes; and if that node received the message from a child node and if that node is not the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received and transmitting the message to the parent node.

  3. Intranode data communications in a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

    2013-07-23

    Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.

  4. Internode data communications in a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J.; Blocksome, Michael A.; Miller, Douglas R.; Parker, Jeffrey J.; Ratterman, Joseph D.; Smith, Brian E.

    2013-09-03

    Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.

  5. Internode data communications in a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.

  6. Intranode data communications in a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

    2014-01-07

    Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a computer node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.

  7. Broadcasting a message in a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Faraj, Daniel A

    2014-11-18

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer that includes: transmitting, by the logical root to all of the nodes directly connected to the logical root, a message; and for each node except the logical root: receiving the message; if that node is the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received; if that node received the message from a parent node and if that node is not a leaf node, then transmitting the message to all of the child nodes; and if that node received the message from a child node and if that node is not the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received and transmitting the message to the parent node.

  8. Optimized data communications in a parallel computer

    DOE Patents [OSTI]

    Faraj, Daniel A.

    2014-08-19

    A parallel computer includes nodes that include a network adapter that couples the node in a point-to-point network and supports communications in opposite directions of each dimension. Optimized communications include: receiving, by a network adapter of a receiving compute node, a packet--from a source direction--that specifies a destination node and deposit hints. Each hint is associated with a direction within which the packet is to be deposited. If a hint indicates the packet to be deposited in the opposite direction: the adapter delivers the packet to an application on the receiving node; forwards the packet to a next node in the opposite direction if the receiving node is not the destination; and forwards the packet to a node in a direction of a subsequent dimension if the hints indicate that the packet is to be deposited in the direction of the subsequent dimension.

  9. Parallel detecting, spectroscopic ellipsometers/polarimeters

    DOE Patents [OSTI]

    Furtak, Thomas E. (15927 W. Ellsworth, Golden, CO 80401)

    2002-01-01

    The parallel detecting spectroscopic ellipsometer/polarimeter sensor has no moving parts and operates in real-time for in-situ monitoring of the thin film surface properties of a sample within a processing chamber. It includes a multi-spectral source of radiation for producing a collimated beam of radiation directed towards the surface of the sample through a polarizer. The thus polarized collimated beam of radiation impacts and is reflected from the surface of the sample, thereby changing its polarization state due to the intrinsic material properties of the sample. The light reflected from the sample is separated into four separate polarized filtered beams, each having individual spectral intensities. Data about said four individual spectral intensities is collected within the processing chamber, and is transmitted into one or more spectrometers. The data of all four individual spectral intensities is then analyzed using transformation algorithms, in real-time.

  10. Link failure detection in a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J. (Rochester, MN); Blocksome, Michael A. (Rochester, MN); Megerian, Mark G. (Rochester, MN); Smith, Brian E. (Rochester, MN)

    2010-11-09

    Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.

  11. Clock Agreement Among Parallel Supercomputer Nodes

    DOE Data Explorer [Office of Scientific and Technical Information (OSTI)]

    Jones, Terry R.; Koenig, Gregory A.

    2014-04-30

    This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines derive much of their computational capability from extreme node counts (over 18000 nodes in the case of the Titan machine). Time-agreement is commonly utilized by parallel programming applications and tools, distributed programming application and tools, and system software. Our time-agreement measurements detail the degree of time variance between nodes and how that variance changes over time. The dataset includes empirical measurements and the accompanying spreadsheets.

  12. LAPACK BLAS Parallel BLAS ScaLAPACK

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    LAPACK BLAS Parallel BLAS ScaLAPACK (E.g., MPI, PVM) PBLAS Local Addressing Global Addressing man intro_blas3 man intro_blacs man intro_lapack BLACS Message Passing Primitives man intro_scalapack Basic Lin. Alg. Communication Subprograms 0 1 2 3 0 4 5 0 1 2 1 NB M N MB a a a a a a a a a a a a a a a a a a a a a a a a 11 12 13 14 a a a a a 15 16 17 18 19 a a a a a a a a a 21 22 23 24 25 26 27 28 29 a a a a a a a a a a a a a a a a a a a a a a a a a 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46 47 48

  13. Clock Agreement Among Parallel Supercomputer Nodes

    DOE Data Explorer [Office of Scientific and Technical Information (OSTI)]

    Jones, Terry R.; Koenig, Gregory A.

    This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines derive much of their computational capability from extreme node counts (over 18000 nodes in the case of the Titan machine). Time-agreement is commonly utilized by parallel programming applications and tools, distributed programming application and tools, and system software. Our time-agreement measurements detail the degree of time variance between nodes and how that variance changes over time. The dataset includes empirical measurements and the accompanying spreadsheets.

  14. Optimized data communications in a parallel computer

    DOE Patents [OSTI]

    Faraj, Daniel A

    2014-10-21

    A parallel computer includes nodes that include a network adapter that couples the node in a point-to-point network and supports communications in opposite directions of each dimension. Optimized communications include: receiving, by a network adapter of a receiving compute node, a packet--from a source direction--that specifies a destination node and deposit hints. Each hint is associated with a direction within which the packet is to be deposited. If a hint indicates the packet to be deposited in the opposite direction: the adapter delivers the packet to an application on the receiving node; forwards the packet to a next node in the opposite direction if the receiving node is not the destination; and forwards the packet to a node in a direction of a subsequent dimension if the hints indicate that the packet is to be deposited in the direction of the subsequent dimension.

  15. Parallelism of the SANDstorm hash algorithm.

    SciTech Connect (OSTI)

    Torgerson, Mark Dolan; Draelos, Timothy John; Schroeppel, Richard Crabtree

    2009-09-01

    Mainstream cryptographic hashing algorithms are not parallelizable. This limits their speed and they are not able to take advantage of the current trend of being run on multi-core platforms. Being limited in speed limits their usefulness as an authentication mechanism in secure communications. Sandia researchers have created a new cryptographic hashing algorithm, SANDstorm, which was specifically designed to take advantage of multi-core processing and be parallelizable on a wide range of platforms. This report describes a late-start LDRD effort to verify the parallelizability claims of the SANDstorm designers. We have shown, with operating code and bench testing, that the SANDstorm algorithm may be trivially parallelized on a wide range of hardware platforms. Implementations using OpenMP demonstrates a linear speedup with multiple cores. We have also shown significant performance gains with optimized C code and the use of assembly instructions to exploit particular platform capabilities.

  16. Data communications for a collective operation in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Faraj, Daniel A.

    2015-11-19

    Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks, a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.

  17. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-09-02

    Eager send data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints that specify a client, a context, and a task, including receiving an eager send data communications instruction with transfer data disposed in a send buffer characterized by a read/write send buffer memory address in a read/write virtual address space of the origin endpoint; determining for the send buffer a read-only send buffer memory address in a read-only virtual address space, the read-only virtual address space shared by both the origin endpoint and the target endpoint, with all frames of physical memory mapped to pages of virtual memory in the read-only virtual address space; and communicating by the origin endpoint to the target endpoint an eager send message header that includes the read-only send buffer memory address.

  18. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Davis, Kristan D.; Faraj, Daniel A.

    2014-07-22

    Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.

  19. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-11-18

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a SEND instruction, the SEND instruction specifying a transmission of transfer data from the origin endpoint to a first target endpoint; transmitting from the origin endpoint to the first target endpoint a Request-To-Send (`RTS`) message advising the first target endpoint of the location and size of the transfer data; assigning by the first target endpoint to each of a plurality of target endpoints separate portions of the transfer data; and receiving by the plurality of target endpoints the transfer data.

  20. Fencing data transfers in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-06-30

    Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint comprising a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI and through data communications resources including a deterministic data communications network, including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.

  1. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-09-16

    Eager send data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints that specify a client, a context, and a task, including receiving an eager send data communications instruction with transfer data disposed in a send buffer characterized by a read/write send buffer memory address in a read/write virtual address space of the origin endpoint; determining for the send buffer a read-only send buffer memory address in a read-only virtual address space, the read-only virtual address space shared by both the origin endpoint and the target endpoint, with all frames of physical memory mapped to pages of virtual memory in the read-only virtual address space; and communicating by the origin endpoint to the target endpoint an eager send message header that includes the read-only send buffer memory address.

  2. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2015-02-03

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a SEND instruction, the SEND instruction specifying a transmission of transfer data from the origin endpoint to a first target endpoint; transmitting from the origin endpoint to the first target endpoint a Request-To-Send (`RTS`) message advising the first target endpoint of the location and size of the transfer data; assigning by the first target endpoint to each of a plurality of target endpoints separate portions of the transfer data; and receiving by the plurality of target endpoints the transfer data.

  3. Data communications in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Davis, Kristan D; Faraj, Daniel A

    2013-07-09

    Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.

  4. Fencing data transfers in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-08-11

    Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint comprising a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI and through data communications resources including a deterministic data communications network, including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.

  5. Fencing data transfers in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-06-02

    Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.

  6. Data communications for a collective operation in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Faraj, Daniel A

    2013-07-16

    Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks, a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.

  7. Fencing data transfers in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-06-09

    Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.

  8. Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A.; Mamidala, Amith R.

    2013-09-03

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  9. Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A; Mamidala, Amith R

    2014-02-11

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  10. Implementation of Generalized Coarse-Mesh Rebalance of NEWTRNX for Acceleration of Parallel Block-Jacobi Transport

    SciTech Connect (OSTI)

    Clarno, Kevin T

    2007-01-01

    The NEWTRNX transport module solves the multigroup, discrete-ordinates source-driven or k-eigenvalue transport equation in parallel on a 3-D unstructured tetrahedral mesh using the extended step characteristics (ESC), also known as the slice-balance approach (SBA), spatial discretization. The spatial domains are decomposed using METIS. NEWTRNX is under development for nuclear reactor analysis on computer hardware ranging from clusters to massively parallel machines, like the Cray XT4. Transport methods that rely on full sweeps across the spatial domain have been shown to display poor scaling for thousands of processors. The Parallel Block-Jacobi (PBJ) algorithm allows each spatial partition to sweep over all discrete-ordinate directions and energies independently of all other domains, potentially allowing for much better scaling than possible with full sweeps. The PBJ algorithm has been implemented in NEWTRNX using a Gauss-Seidel iteration in energy and an asynchronous communication by an energy group, such that each partition utilizes the latest boundary solution available for each group before solving the withingroup scattering in a given group. For each energy group, the within-group scattering converges with a generalized minimum residual (GMRES) solver, preconditioned with beta transport synthetic acceleration ({beta}-TSA).

  11. Translation invariant time-dependent massive gravity: Hamiltonian analysis

    SciTech Connect (OSTI)

    Mourad, Jihad; Steer, Danile A.; Noui, Karim E-mail: karim.noui@lmpt.univ-tours.fr

    2014-09-01

    The canonical structure of the massive gravity in the first order moving frame formalism is studied. We work in the simplified context of translation invariant fields, with mass terms given by general non-derivative interactions, invariant under the diagonal Lorentz group, depending on the moving frame as well as a fixed reference frame. We prove that the only mass terms which give 5 propagating degrees of freedom are the dRGT mass terms, namely those which are linear in the lapse. We also complete the Hamiltonian analysis with the dynamical evolution of the system.

  12. Protecting Recovery Act Cleanup Site During Massive Wildfire

    Office of Environmental Management (EM)

    July 13, 2011 Protecting Recovery Act Cleanup Site During Massive Wildfire LOS ALAMOS, N.M. - Effective safety procedures in place at Los Alamos National Laboratory would have provided protections in the event that the raging Las Conchas fire had spread to the site of an American Recovery and Reinvestment Act project. "Our procedures not only placed the waste excavation site, Materials Disposal Area B (MDA-B), into a safe posture so it was well protected during the fire, but also allowed us

  13. Cosmic expansion histories in massive bigravity with symmetric matter coupling

    SciTech Connect (OSTI)

    Enander, Jonas; Mrtsell, Edvard [Oskar Klein Center, Stockholm University, Albanova University Center, 106 91 Stockholm (Sweden); Solomon, Adam R. [DAMTP, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WA (United Kingdom); Akrami, Yashar, E-mail: enander@fysik.su.se, E-mail: a.r.solomon@damtp.cam.ac.uk, E-mail: yashar.akrami@astro.uio.no, E-mail: edvard@fysik.su.se [Institute of Theoretical Astrophysics, University of Oslo, P.O. Box 1029 Blindern, N-0315 Oslo (Norway)

    2015-01-01

    We study the cosmic expansion history of massive bigravity with a viable matter coupling which treats both metrics on equal footing. We derive the Friedmann equation for the effective metric through which matter couples to the two metrics, and study its solutions. For certain parameter choices, the background cosmology is identical to that of ?CDM. More general parameters yield dynamical dark energy, which can still be in agreement with observations of the expansion history. We study specific parameter choices of interest, including minimal models, maximally-symmetric models, and a candidate partially-massless theory.

  14. Closed-form decomposition of one-loop massive amplitudes

    SciTech Connect (OSTI)

    Britto, Ruth; Feng Bo; Mastrolia, Pierpaolo

    2008-07-15

    We present formulas for the coefficients of 2-, 3-, 4-, and 5-point master integrals for one-loop massive amplitudes. The coefficients are derived from unitarity cuts in D dimensions. The input parameters can be read off from any unitarity-cut integrand, as assembled from tree-level expressions, after simple algebraic manipulations. The formulas presented here are suitable for analytical as well as numerical evaluation. Their validity is confirmed in two known cases of helicity amplitudes contributing to gg{yields}gg and gg{yields}gH, where the masses of the Higgs and the fermion circulating in the loop are kept as free parameters.

  15. Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications

    Office of Scientific and Technical Information (OSTI)

    (Conference) | SciTech Connect Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications Citation Details In-Document Search Title: Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications This paper describes a kernel scheduling algorithm that is based on co-scheduling principles and that is intended for parallel applications running on 1000 cores or more where inter-node scalability is key. Experimental results for a Linux implementation on a Cray XT5 machine are

  16. Linux Kernel Co-Scheduling and Bulk Synchronous Parallelism (Journal

    Office of Scientific and Technical Information (OSTI)

    Article) | SciTech Connect Linux Kernel Co-Scheduling and Bulk Synchronous Parallelism Citation Details In-Document Search Title: Linux Kernel Co-Scheduling and Bulk Synchronous Parallelism This paper describes a kernel scheduling algorithm that is based on coscheduling principles and that is intended for parallel applications running on 1000 cores or more. Experimental results for a Linux implementation on a Cray XT5 machine are presented. The results indicate that Linux is a suitable

  17. De Novo Ultrascale Atomistic Simulations On High-End Parallel

    Office of Scientific and Technical Information (OSTI)

    Supercomputers (Journal Article) | SciTech Connect De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers Citation Details In-Document Search Title: De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end

  18. Berkeley Unified Parallel C (UPC) Runtime Library

    Energy Science and Technology Software Center (OSTI)

    2003-03-31

    This software comprises a portable, open source implementation of a runtime library to support applications written in the Unified Parallel C (UPC) language. This library implements the UPC-specific functionality, including shared memory allocation and locks. The network-dependent functionality is implemented as a thin wrapper around a separate library implementing the GASNet (Global-Address Space Networking) specification. For true shared memory machines. GASNet is bypassed in favor of direct memory operations and local synchronization mechanisms. The Berkeleymore » UPC Runtime Library is currently the only implementation of the "Berkeley UPC Runtime Specification", and thus the only runtme library usable with the Berkeley UPC Compiler. Also, it is the only UPC runtime known to the author to provide two shared pointer representations: one for arbitrary blocksizes and one to optimize for the common cases of phaseless and blocksize=1. For distributed memory environments a library implementing the GASNet (Global-Address Space Networking) specification is required for communication. While no specialized hardware is required, a high-speed interconnet supported by the GASNet implementation is suggested for preformance. If no supported high-speed interconnect is available. GASNet can run over MPI. An external library is reqired for certain local memory allocation operations. A well defined interface allows for multiple implementations of this library, but at present the "umalloc" library from LBNL is the only compatible implementation.« less

  19. Flexible Language Constructs for Large Parallel Programs

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Rosing, Matt; Schnabel, Robert

    1994-01-01

    The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less

  20. Multi-petascale highly efficient parallel supercomputer

    DOE Patents [OSTI]

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  1. Mesoscale Simulations of Particulate Flows with Parallel Distributed...

    Office of Scientific and Technical Information (OSTI)

    Title: Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique Fluid particulate flows are common phenomena in nature and industry. ...

  2. A set of parallel, implicit methods for a reconstructed discontinuous...

    Office of Scientific and Technical Information (OSTI)

    Journal Article: A set of parallel, implicit methods for a reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids Citation Details In-Document Search...

  3. Building the Next Generation of Parallel Applications: Co-Design...

    Office of Scientific and Technical Information (OSTI)

    Applications: Co-Design Opportunities and Challenges. Citation Details In-Document Search Title: Building the Next Generation of Parallel Applications: Co-Design Opportunities and ...

  4. Using ARM data to correct plane-parallel satellite retrievals...

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Using ARM data to correct plane-parallel satellite retrievals of cloud properties Dong, Xiquan University of North Dakota Minnis, Patrick NASA Langley Research Center Xi, Baike...

  5. A set of parallel, implicit methods for a reconstructed discontinuous...

    Office of Scientific and Technical Information (OSTI)

    Furthermore, an SPMD (single program, multiple data) programming paradigm based on MPI is proposed to achieve parallelism. The numerical results on complex geometries...

  6. Mesoscale simulations of particulate flows with parallel distributed...

    Office of Scientific and Technical Information (OSTI)

    distributed Lagrange multiplier technique Citation Details In-Document Search Title: Mesoscale simulations of particulate flows with parallel distributed Lagrange multiplier ...

  7. A garbage collection algorithm for shared memory parallel processors

    SciTech Connect (OSTI)

    Crammond, J. )

    1988-12-01

    This paper describes a technique for adapting the Morris sliding garbage collection algorithm to execute on parallel machines with shared memory. The algorithm is described within the framework of an implementation of the parallel logic language Parlog. However, the algorithm is a general one and can easily be adapted to parallel Prolog systems and to other languages. The performance of the algorithm executing a few simple Parlog benchmarks is analyzed. Finally, it is shown how the technique for parallelizing the sequential algorithm can be adapted for a semi-space copying algorithm.

  8. Current parallel I/O limitations to scalable data analysis.

    SciTech Connect (OSTI)

    Mascarenhas, Ajith Arthur; Pebay, Philippe Pierre

    2011-07-01

    This report describes the limitations to parallel scalability which we have encountered when applying our otherwise optimally scalable parallel statistical analysis tool kit to large data sets distributed across the parallel file system of the current premier DOE computational facility. This report describes our study to evaluate the effect of parallel I/O on the overall scalability of a parallel data analysis pipeline using our scalable parallel statistics tool kit [PTBM11]. In this goal, we tested it using the Jaguar-pf DOE/ORNL peta-scale platform on a large combustion simulation data under a variety of process counts and domain decompositions scenarios. In this report we have recalled the foundations of the parallel statistical analysis tool kit which we have designed and implemented, with the specific double intent of reproducing typical data analysis workflows, and achieving optimal design for scalable parallel implementations. We have briefly reviewed those earlier results and publications which allow us to conclude that we have achieved both goals. However, in this report we have further established that, when used in conjuction with a state-of-the-art parallel I/O system, as can be found on the premier DOE peta-scale platform, the scaling properties of the overall analysis pipeline comprising parallel data access routines degrade rapidly. This finding is problematic and must be addressed if peta-scale data analysis is to be made scalable, or even possible. In order to attempt to address these parallel I/O limitations, we will investigate the use the Adaptable IO System (ADIOS) [LZL+10] to improve I/O performance, while maintaining flexibility for a variety of IO options, such MPI IO, POSIX IO. This system is developed at ORNL and other collaborating institutions, and is being tested extensively on Jaguar-pf. Simulation code being developed on these systems will also use ADIOS to output the data thereby making it easier for other systems, such as ours, to process that data.

  9. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-18

    Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  10. Petascale Parallelization of the Gyrokinetic Toroidal Code

    SciTech Connect (OSTI)

    Ethier, Stephane; Adams, Mark; Carter, Jonathan; Oliker, Leonid

    2010-05-01

    The Gyrokinetic Toroidal Code (GTC) is a global, three-dimensional particle-in-cell application developed to study microturbulence in tokamak fusion devices. The global capability of GTC is unique, allowing researchers to systematically analyze important dynamics such as turbulence spreading. In this work we examine a new radial domain decomposition approach to allow scalability onto the latest generation of petascale systems. Extensive performance evaluation is conducted on three high performance computing systems: the IBM BG/P, the Cray XT4, and an Intel Xeon Cluster. Overall results show that the radial decomposition approach dramatically increases scalability, while reducing the memory footprint - allowing for fusion device simulations at an unprecedented scale. After a decade where high-end computing (HEC) was dominated by the rapid pace of improvements to processor frequencies, the performance of next-generation supercomputers is increasingly differentiated by varying interconnect designs and levels of integration. Understanding the tradeoffs of these system designs is a key step towards making effective petascale computing a reality. In this work, we examine a new parallelization scheme for the Gyrokinetic Toroidal Code (GTC) [?] micro-turbulence fusion application. Extensive scalability results and analysis are presented on three HEC systems: the IBM BlueGene/P (BG/P) at Argonne National Laboratory, the Cray XT4 at Lawrence Berkeley National Laboratory, and an Intel Xeon cluster at Lawrence Livermore National Laboratory. Overall results indicate that the new radial decomposition approach successfully attains unprecedented scalability to 131,072 BG/P cores by overcoming the memory limitations of the previous approach. The new version is well suited to utilize emerging petascale resources to access new regimes of physical phenomena.

  11. Parallel 3-D method of characteristics in MPACT

    SciTech Connect (OSTI)

    Kochunas, B.; Dovvnar, T. J.; Liu, Z.

    2013-07-01

    A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k{sub eff} differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)

  12. Spectral function of a fermion coupled with a massive vector boson at

    Office of Scientific and Technical Information (OSTI)

    finite temperature in a gauge invariant formalism (Journal Article) | SciTech Connect Spectral function of a fermion coupled with a massive vector boson at finite temperature in a gauge invariant formalism Citation Details In-Document Search Title: Spectral function of a fermion coupled with a massive vector boson at finite temperature in a gauge invariant formalism We investigate spectral properties of a fermion coupled with a massive gauge boson with a mass m at finite temperature (T) in

  13. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; Prins, Jan F.

    2013-01-01

    Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMAmore » systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less

  14. Broadcasting collective operation contributions throughout a parallel computer

    DOE Patents [OSTI]

    Faraj, Ahmad (Rochester, MN)

    2012-02-21

    Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.

  15. Parallel architecture for real-time simulation. Master's thesis

    SciTech Connect (OSTI)

    Cockrell, C.D.

    1989-01-01

    This thesis is concerned with the development of a very fast and highly efficient parallel computer architecture for real-time simulation of continuous systems. Currently, several parallel processing systems exist that may be capable of executing a complex simulation in real-time. These systems are examined and the pros and cons of each system discussed. The thesis then introduced a custom-designed parallel architecture based upon The University of Alabama's OPERA architecture. Each component of this system is discussed and rationale presented for its selection. The problem selected, real-time simulation of the Space Shuttle Main Engine for the test and evaluation of the proposed architecture, is explored, identifying the areas where parallelism can be exploited and parallel processing applied. Results from the test and evaluation phase are presented and compared with the results of the same problem that has been processed on a uniprocessor system.

  16. LIMB-DARKENED RADIATION-DRIVEN WINDS FROM MASSIVE STARS

    SciTech Connect (OSTI)

    Cure, M.; Cidale, L.

    2012-10-01

    We calculated the influence of the limb-darkened finite-disk correction factor in the theory of radiation-driven winds from massive stars. We solved the one-dimensional m-CAK hydrodynamical equation of rotating radiation-driven winds for all three known solutions, i.e., fast, {Omega}-slow, and {delta}-slow. We found that for the fast solution, the mass-loss rate is increased by a factor of {approx}10%, while the terminal velocity is reduced about 10%, when compared with the solution using a finite-disk correction factor from a uniformly bright star. For the other two slow solutions, the changes are almost negligible. Although we found that the limb darkening has no effects on the wind-momentum-luminosity relationship, it would affect the calculation of synthetic line profiles and the derivation of accurate wind parameters.

  17. Galileons coupled to massive gravity: general analysis and cosmological solutions

    SciTech Connect (OSTI)

    Goon, Garrett; Trodden, Mark; Gmrko?lu, A. Emir; Hinterbichler, Kurt; Mukohyama, Shinji E-mail: Emir.Gumrukcuoglu@nottingham.ac.uk E-mail: shinji.mukohyama@ipmu.jp

    2014-08-01

    We further develop the framework for coupling galileons and Dirac-Born-Infeld (DBI) scalar fields to a massive graviton while retaining both the non-linear symmetries of the scalars and ghost-freedom of the theory. The general construction is recast in terms of vielbeins which simplifies calculations and allows for compact expressions. Expressions for the general form of the action are derived, with special emphasis on those models which descend from maximally symmetric spaces. We demonstrate the existence of maximally symmetric solutions to the fully non-linear theory and analyze their spectrum of quadratic fluctuations. Finally, we consider self-accelerating cosmological solutions and study their perturbations, showing that the vector and scalar modes have vanishing kinetic terms.

  18. Nanowire growth by an electron beam induced massive phase transformation

    SciTech Connect (OSTI)

    Sood, Shantanu; Kisslinger, Kim; Gouma, Perena

    2014-11-15

    Tungsten trioxide nanowires of a high aspect ratio have been synthesized in-situ in a TEM under an electron beam of current density 14A/cm due to a massive polymorphic reaction. Sol-gel processed pseudocubic phase nanocrystals of tungsten trioxide were seen to rapidly transform to one dimensional monoclinic phase configurations, and this reaction was independent of the substrate on which the material was deposited. The mechanism of the self-catalyzed polymorphic transition and accompanying radical shape change is a typical characteristic of metastable to stable phase transformations in nanostructured polymorphic metal oxides. A heuristic model is used to confirm the metastable to stable growth mechanism. The findings are important to the control electron beam deposition of nanowires for functional applications starting from colloidal precursors.

  19. Search for Charged Massive Long-Lived Particles

    SciTech Connect (OSTI)

    Abazov V. M.; Abbott B.; Acharya B. S.; Adams M.; Adams T.; Alexeev G. D.; Alimena J.; Alkhazov G.; Alton A.; Alverson G.; Alves G. A.; Aoki M.; Askew A.; Asman B.; Atkins S.; Atramentov O.; Augsten K.; Avila C.; BackusMayes J.; Badaud F.; Bagby L.; Baldin B.; Bandurin D. V.; Banerjee S.; Barberis E.; Baringer P.; Barreto J.; Bartlett J. F.; Bassler U.; Bazterra V.; Bean A.; Begalli M.; Belanger-Champagne C.; Bellantoni L.; Beri S. B.; Bernardi G.; Bernhard R.; Bertram I.; Besancon M.; Beuselinck R.; Bezzubov V. A.; Bhat P. C.; Bhatnagar V.; Blazey G.; Blessing S.; Bloom K.; Boehnlein A.; Boline D.; Boos E. E.; Borissov G.; Bose T.; Brandt A.; Brandt O.; Brock R.; Brooijmans G.; Bross A.; Brown D.; Brown J.; Bu X. B.; Buehler M.; Buescher V.; Bunichev V.; Burdin S.; Burnett T. H.; Buszello C. P.; Calpas B.; Camacho-Perez E.; Carrasco-Lizarraga M. A.; Casey B. C. K.; Castilla-Valdez H.; Chakrabarti S.; Chakraborty D.; Chan K. M.; Chandra A.; Chapon E.; Chen G.; Chevalier-Thery S.; Cho D. K.; Cho S. W.; Choi S.; Choudhary B.; Cihangir S.; Claes D.; Clutter J.; Cooke M.; Cooper W. E.; Corcoran M.; Couderc F.; Cousinou M. -C.; Croc A.; Cutts D.; Das A.; Davies G.; De K.; de Jong S. J.; De la Cruz-Burelo E.; Deliot F.; Demina R.; Denisov D.; Denisov S. P.; Desai S.; Deterre C.; DeVaughan K.; Diehl H. T.; Diesburg M.; Ding P. F.; Dominguez A.; Dorland T.; Dubey A.; Dudko L. V.; Duggan D.; Duperrin A.; Dutt S.; Dyshkant A.; Eads M.; Edmunds D.; Ellison J.; Elvira V. D.; Enari Y.; Evans H.; Evdokimov A.; Evdokimov V. N.; Facini G.; Ferbel T.; Fiedler F.; Filthaut F.; Fisher W.; Fisk H. E.; Fortner M.; Fox H.; Fuess S.; Garcia-Bellido A.; Garcia-Guerra G. A.; Gavrilov V.; Gay P.; Geng W.; Gerbaudo D.; Gerber C. E.; Gershtein Y.; Ginther G.; Golovanov G.; Goussiou A.; Grannis P. D.; Greder S.; Greenlee H.; Greenwood Z. D.; Gregores E. M.; Grenier G.; Gris Ph.; Grivaz J. -F.; Grohsjean A.; Gruenendahl S.; Gruenewald M. W.; Guillemin T.; Gutierrez G.; Gutierrez P.; Haas A.; Hagopian S.; Haley J.; Han L.; Harder K.; Harel A.; Hauptman J. M.; Hays J.; Head T.; Hebbeker T.; Hedin D.; Hegab H.; Heinson A. P.; Heintz U.; Hensel C.; Heredia-De La Cruz I.; Herner K.; Hesketh G.; Hildreth M. D.; Hirosky R.; Hoang T.; Hobbs J. D.; Hoeneisen B.; Hohlfeld M.; Hubacek Z.; Hynek V.; Iashvili I.; Ilchenko Y.; Illingworth R.; Ito A. S.; Jabeen S.; Jaffre M.; Jamin D.; Jayasinghe A.; Jesik R.; Johns K.; Johnson M.; Jonckheere A.; Jonsson P.; Joshi J.; Jung A. W.; Juste A.; Kaadze K.; Kajfasz E.; Karmanov D.; Kasper P. A.; Katsanos I.; Kehoe R.; Kermiche S.; Khalatyan N.; Khanov A.; Kharchilava A.; Kharzheev Y. N.; Kohli J. M.; Kozelov A. V.; Kraus J.; Kulikov S.; Kumar A.; Kupco A.; Kurca T.; Kuzmin V. A.; Kvita J.; Lammers S.; Landsberg G.; Lebrun P.; Lee H. S.; Lee S. W.; Lee W. M.; Lellouch J.; Li L.; Li Q. Z.; Lietti S. M.; Lim J. K.; Lincoln D.; Linnemann J.; Lipaev V. V.; Lipton R.; Liu Y.; Lobodenko A.; Lokajicek M.; de Sa R. Lopes; Lubatti H. J.; Luna-Garcia R.; Lyon A. L.; Maciel A. K. A.; Mackin D.; Madar R.; Magana-Villalba R.; Malik S.; Malyshev V. L.; Maravin Y.; Martinez-Ortega J.; McCarthy R.; McGivern C. L.; Meijer M. M.; et al.

    2012-03-21

    We report on a search for charged massive long-lived particles (CMLLPs), based on 5.2 fb{sup -1} of integrated luminosity collected with the D0 detector at the Fermilab Tevatron p{bar p} collider. We search for events in which one or more particles are reconstructed as muons but have speed and ionization energy loss (dE/dx) inconsistent with muons produced in beam collisions. CMLLPs are predicted in several theories of physics beyond the standard model. We exclude pair-produced long-lived gaugino-like charginos below 267 GeV and Higgsino-like charginos below 217 GeV at 95% C.L., as well as long-lived scalar top quarks with mass below 285 GeV.

  20. Nanowire growth by an electron beam induced massive phase transformation

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Sood, Shantanu; Kisslinger, Kim; Gouma, Perena

    2014-11-15

    Tungsten trioxide nanowires of a high aspect ratio have been synthesized in-situ in a TEM under an electron beam of current density 14A/cm² due to a massive polymorphic reaction. Sol-gel processed pseudocubic phase nanocrystals of tungsten trioxide were seen to rapidly transform to one dimensional monoclinic phase configurations, and this reaction was independent of the substrate on which the material was deposited. The mechanism of the self-catalyzed polymorphic transition and accompanying radical shape change is a typical characteristic of metastable to stable phase transformations in nanostructured polymorphic metal oxides. A heuristic model is used to confirm the metastable to stablemore » growth mechanism. The findings are important to the control electron beam deposition of nanowires for functional applications starting from colloidal precursors.« less

  1. X-RAY EMISSION FROM MAGNETIC MASSIVE STARS

    SciTech Connect (OSTI)

    Naz, Yal; Petit, Vronique; Rinbrand, Melanie; Owocki, Stan; Cohen, David; Ud-Doula, Asif; Wade, Gregg A.

    2014-11-01

    Magnetically confined winds of early-type stars are expected to be sources of bright and hard X-rays. To clarify the systematics of the observed X-ray properties, we have analyzed a large series of Chandra and XMM-Newton observations, corresponding to all available exposures of known massive magnetic stars (over 100 exposures covering ?60% of stars compiled in the catalog of Petit et al.). We show that the X-ray luminosity is strongly correlated with the stellar wind mass-loss rate, with a power-law form that is slightly steeper than linear for the majority of the less luminous, lower- M-dot B stars and flattens for the more luminous, higher- M-dot O stars. As the winds are radiatively driven, these scalings can be equivalently written as relations with the bolometric luminosity. The observed X-ray luminosities, and their trend with mass-loss rates, are well reproduced by new MHD models, although a few overluminous stars (mostly rapidly rotating objects) exist. No relation is found between other X-ray properties (plasma temperature, absorption) and stellar or magnetic parameters, contrary to expectations (e.g., higher temperature for stronger mass-loss rate). This suggests that the main driver for the plasma properties is different from the main determinant of the X-ray luminosity. Finally, variations of the X-ray hardnesses and luminosities, in phase with the stellar rotation period, are detected for some objects and they suggest that some temperature stratification exists in massive stars' magnetospheres.

  2. Massive graviton on arbitrary background: derivation, syzygies, applications

    SciTech Connect (OSTI)

    Bernard, Laura; Deffayet, Cédric; Strauss, Mikael von

    2015-06-23

    We give the detailed derivation of the fully covariant form of the quadratic action and the derived linear equations of motion for a massive graviton in an arbitrary background metric (which were presented in arXiv:1410.8302 [hep-th]). Our starting point is the de Rham-Gabadadze-Tolley (dRGT) family of ghost free massive gravities and using a simple model of this family, we are able to express this action and these equations of motion in terms of a single metric in which the graviton propagates, hence removing in particular the need for a “reference metric' which is present in the non perturbative formulation. We show further how 5 covariant constraints can be obtained including one which leads to the tracelessness of the graviton on flat space-time and removes the Boulware-Deser ghost. This last constraint involves powers and combinations of the curvature of the background metric. The 5 constraints are obtained for a background metric which is unconstrained, i.e. which does not have to obey the background field equations. We then apply these results to the case of Einstein space-times, where we show that the 5 constraints become trivial, and Friedmann-Lemaître-Robertson-Walker space-times, for which we correct in particular some results that appeared elsewhere. To reach our results, we derive several non trivial identities, syzygies, involving the graviton fields, its derivatives and the background metric curvature. These identities have their own interest. We also discover that there exist backgrounds for which the dRGT equations cannot be unambiguously linearized.

  3. Electrostatically focused addressable field emission array chips (AFEA's) for high-speed massively parallel maskless digital E-beam direct write lithography and scanning electron microscopy

    DOE Patents [OSTI]

    Thomas, Clarence E.; Baylor, Larry R.; Voelkl, Edgar; Simpson, Michael L.; Paulus, Michael J.; Lowndes, Douglas H.; Whealton, John H.; Whitson, John C.; Wilgen, John B.

    2002-12-24

    Systems and methods are described for addressable field emission array (AFEA) chips. A method of operating an addressable field-emission array, includes: generating a plurality of electron beams from a pluralitly of emitters that compose the addressable field-emission array; and focusing at least one of the plurality of electron beams with an on-chip electrostatic focusing stack. The systems and methods provide advantages including the avoidance of space-charge blow-up.

  4. Xyce Parallel Electronic Simulator : users' guide, version 4.1.

    SciTech Connect (OSTI)

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

  5. Xyce parallel electronic simulator : users' guide. Version 5.1.

    SciTech Connect (OSTI)

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-11-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

  6. Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

    SciTech Connect (OSTI)

    1997-12-31

    This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.

  7. Nemesis I: Parallel Enhancements to ExodusII

    Energy Science and Technology Software Center (OSTI)

    2006-03-28

    NEMESIS I is an enhancement to the EXODUS II finite element database model used to store and retrieve data for unstructured parallel finite element analyses. NEMESIS I adds data structures which facilitate the partitioning of a scalar (standard serial) EXODUS II file onto parallel disk systems found on many parallel computers. Since the NEMESIS I application programming interface (APl)can be used to append information to an existing EXODUS II files can be used on filesmore » which contain NEMESIS I information. The NEMESIS I information is written and read via C or C++ callable functions which compromise the NEMESIS I API.« less

  8. TECA: A Parallel Toolkit for Extreme Climate Analysis

    SciTech Connect (OSTI)

    Prabhat, Mr; Ruebel, Oliver; Byna, Surendra; Wu, Kesheng; Li, Fuyu; Wehner, Michael; Bethel, E. Wes

    2012-03-12

    We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

  9. pcircle - A Suite of Scalable Parallel File System Tools

    Energy Science and Technology Software Center (OSTI)

    2015-10-01

    Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as asyncmore » progress report, checkpoint and restart, as well as integrity checking.« less

  10. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-11

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  11. A Comprehensive Look at High Performance Parallel I/O

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    of calculations per second-generating a tsunami of data along the way. In this era of "big data," high performance parallel IO-the way disk drives efficiently read and write...

  12. Apply for the Parallel Computing Summer Research Internship

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel Computing » How to Apply Apply for the Parallel Computing Summer Research Internship Creating next-generation leaders in HPC research and applications development Program Co-Lead Robert (Bob) Robey Email Program Co-Lead Gabriel Rockefeller Email Program Co-Lead Hai Ah Nam Email Professional Staff Assistant Nicole Aguilar Garcia (505) 665-3048 Email Current application deadline is February 5, 2016 with notification by early March 2016. Who can apply? Upper division undergraduate

  13. Parallel and Antiparallel Interfacial Coupling in AF-FM Bilayers

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel and Antiparallel Interfacial Coupling in AF-FM Bilayers Parallel and Antiparallel Interfacial Coupling in AF-FM Bilayers Print Wednesday, 30 August 2006 00:00 Cooling an antiferromagnetic-ferromagnetic bilayer in a magnetic field typically results in a remanent (zero-field) magnetization in the ferromagnet (FM) that is always in the direction of the field during cooling (positive Mrem). Strikingly, when FeF2 is the antiferromagnet (AF), cooling in a field can lead to a remanent

  14. Mesoscale Simulations of Particulate Flows with Parallel Distributed

    Office of Scientific and Technical Information (OSTI)

    Lagrange Multiplier Technique (Conference) | SciTech Connect Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique Citation Details In-Document Search Title: Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique Fluid particulate flows are common phenomena in nature and industry. Modeling of such flows at micro and macro levels as well establishing relationships between these approaches are needed to

  15. Mesoscale simulations of particulate flows with parallel distributed

    Office of Scientific and Technical Information (OSTI)

    Lagrange multiplier technique (Journal Article) | SciTech Connect Journal Article: Mesoscale simulations of particulate flows with parallel distributed Lagrange multiplier technique Citation Details In-Document Search Title: Mesoscale simulations of particulate flows with parallel distributed Lagrange multiplier technique Authors: Kanarska, Y ; Lomov, I ; Antoun, T Publication Date: 2010-09-10 OSTI Identifier: 1120915 Report Number(s): LLNL-JRNL-455392 DOE Contract Number: W-7405-ENG-48

  16. Parallel Integral Curves (Book) | SciTech Connect

    Office of Scientific and Technical Information (OSTI)

    Book: Parallel Integral Curves Citation Details In-Document Search Title: Parallel Integral Curves Authors: Pugmire, Dave [1] ; Peterka, Tom [2] ; Garth, Christoph [3] + Show Author Affiliations ORNL Argonne National Laboratory (ANL) unknown Publication Date: 2012-01-01 OSTI Identifier: 1096343 DOE Contract Number: DE-AC05-00OR22725 Resource Type: Book Publisher: Chapman & Hall/CRC Press, Tampa, FL, USA Research Org: Oak Ridge National Laboratory (ORNL) Sponsoring Org: SC USDOE - Office of

  17. Mesoscale Simulations of Particulate Flows with Parallel Distributed

    Office of Scientific and Technical Information (OSTI)

    Lagrange Multiplier Technique (Conference) | SciTech Connect Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique Citation Details In-Document Search Title: Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique × You are accessing a document from the Department of Energy's (DOE) SciTech Connect. This site is a product of DOE's Office of Scientific and Technical Information (OSTI) and is provided as a

  18. Mesoscale simulations of particulate flows with parallel distributed

    Office of Scientific and Technical Information (OSTI)

    Lagrange multiplier technique (Journal Article) | SciTech Connect Journal Article: Mesoscale simulations of particulate flows with parallel distributed Lagrange multiplier technique Citation Details In-Document Search Title: Mesoscale simulations of particulate flows with parallel distributed Lagrange multiplier technique × You are accessing a document from the Department of Energy's (DOE) SciTech Connect. This site is a product of DOE's Office of Scientific and Technical Information (OSTI)

  19. A set of parallel, implicit methods for a reconstructed discontinuous

    Office of Scientific and Technical Information (OSTI)

    Galerkin method for compressible flows on 3D hybrid grids (Journal Article) | SciTech Connect Journal Article: A set of parallel, implicit methods for a reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids Citation Details In-Document Search Title: A set of parallel, implicit methods for a reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids A set of implicit methods are proposed for a third-order hierarchical WENO

  20. Building the Next Generation of Parallel Applications: Co-Design

    Office of Scientific and Technical Information (OSTI)

    Opportunities and Challenges. (Conference) | SciTech Connect Building the Next Generation of Parallel Applications: Co-Design Opportunities and Challenges. Citation Details In-Document Search Title: Building the Next Generation of Parallel Applications: Co-Design Opportunities and Challenges. Abstract not provided. Authors: Heroux, Michael Allen Publication Date: 2011-04-01 OSTI Identifier: 1108313 Report Number(s): SAND2011-2822C 470544 DOE Contract Number: AC04-94AL85000 Resource Type:

  1. Clock Agreement Among Parallel Supercomputer Nodes (Dataset) | SciTech

    Office of Scientific and Technical Information (OSTI)

    Connect Dataset: Clock Agreement Among Parallel Supercomputer Nodes Citation Details In-Document Search Title: Clock Agreement Among Parallel Supercomputer Nodes This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines

  2. Interface for Parallel I/O from Componentized Visualization Algorithms

    Energy Science and Technology Software Center (OSTI)

    2008-09-16

    The software is an interface layer over file I/O with features specifically designed for efficient parallel reads and writes. The interface provides multiple concrete implementations that easily allow the replacement of one interface with another. This feature allows a reader or writer implementation to work independently of whether parallel file I/O is available or desired. The software also contains extensions to some readers to allow it to use the file I/O functionality.

  3. Multiscale Molecular Simulations at the Petascale (Parallelization of

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Reactive Force Field Model for Blue Gene/Q): ALCF-2 Early Science Program Technical Report (Technical Report) | SciTech Connect Multiscale Molecular Simulations at the Petascale (Parallelization of Reactive Force Field Model for Blue Gene/Q): ALCF-2 Early Science Program Technical Report Citation Details In-Document Search Title: Multiscale Molecular Simulations at the Petascale (Parallelization of Reactive Force Field Model for Blue Gene/Q): ALCF-2 Early Science Program Technical Report

  4. Chassis Dynamometer Testing of Parallel and Series Diesel Hybrid Buses |

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    Department of Energy Chassis Dynamometer Testing of Parallel and Series Diesel Hybrid Buses Chassis Dynamometer Testing of Parallel and Series Diesel Hybrid Buses Emissions and fuel economy data were studied from tests on four diesel and diesel hybrid transit buses using the Houston Metro Bus Cycle. PDF icon p-16_muncrief.pdf More Documents & Publications Design of Integrated Laboratory and Heavy-Duty Emissions Testing Center Combining Biodiesel and EGR for Low-Temperature NOx and PM

  5. The structural simulation toolkit :a tool for exploring parallel

    Office of Scientific and Technical Information (OSTI)

    architectures and applications. (Conference) | SciTech Connect structural simulation toolkit :a tool for exploring parallel architectures and applications. Citation Details In-Document Search Title: The structural simulation toolkit :a tool for exploring parallel architectures and applications. No abstract prepared. Authors: Kogge, Peter [1] ; Murphy, Richard C. ; Rodrigues, Arun F. ; Underwood, Keith Douglas + Show Author Affiliations (Univeristy of Notre Dame, Notre Dame, IN) Publication

  6. Mesoscale Simulations of Particulate Flows with Parallel Distributed

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Lagrange Multiplier Technique (Conference) | SciTech Connect Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique Citation Details In-Document Search Title: Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique Fluid particulate flows are common phenomena in nature and industry. Modeling of such flows at micro and macro levels as well establishing relationships between these approaches are needed to

  7. WAS THE SUN BORN IN A MASSIVE CLUSTER?

    SciTech Connect (OSTI)

    Dukes, Donald; Krumholz, Mark R.

    2012-07-20

    A number of authors have argued that the Sun must have been born in a cluster of no more than several thousand stars, on the basis that, in a larger cluster, close encounters between the Sun and other stars would have truncated the outer solar system or excited the outer planets into eccentric orbits. However, this dynamical limit is in tension with meteoritic evidence that the solar system was exposed to a nearby supernova during or shortly after its formation; a several-thousand-star cluster is much too small to produce a massive star whose lifetime is short enough to have provided the enrichment. In this paper, we revisit the dynamical limit in the light of improved observations of the properties of young clusters. We use a series of scattering simulations to measure the velocity-dependent cross-section for disruption of the outer solar system by stellar encounters, and use this cross-section to compute the probability of a disruptive encounter as a function of birth cluster properties. We find that, contrary to prior work, the probability of disruption is small regardless of the cluster mass, and that it actually decreases rather than increases with cluster mass. Our results differ from prior work for three main reasons: (1) unlike in most previous work, we compute a velocity-dependent cross-section and properly integrate over the cluster mass-dependent velocity distribution of incoming stars; (2) we recognize that {approx}90% of clusters have lifetimes of a few crossing times, rather than the 10-100 Myr adopted in many earlier models; and (3) following recent observations, we adopt a mass-independent surface density for embedded clusters, rather than a mass-independent radius as assumed many earlier papers. Our results remove the tension between the dynamical limit and the meteoritic evidence, and suggest that the Sun was born in a massive cluster. A corollary to this result is that close encounters in the Sun's birth cluster are highly unlikely to truncate the Kuiper Belt unless the Sun was born in one of the unusual clusters that survived for tens of Myr. However, we find that encounters could plausibly produce highly eccentric Kuiper Belt objects such as Sedna.

  8. Sort-First, Distributed Memory Parallel Visualization and Rendering

    SciTech Connect (OSTI)

    Bethel, E. Wes; Humphreys, Greg; Paul, Brian; Brederson, J. Dean

    2003-07-15

    While commodity computing and graphics hardware has increased in capacity and dropped in cost, it is still quite difficult to make effective use of such systems for general-purpose parallel visualization and graphics. We describe the results of a recent project that provides a software infrastructure suitable for general-purpose use by parallel visualization and graphics applications. Our work combines and extends two technologies: Chromium, a stream-oriented framework that implements the OpenGL programming interface; and OpenRM Scene Graph, a pipelined-parallel scene graph interface for graphics data management. Using this combination, we implement a sort-first, distributed memory, parallel volume rendering application. We describe the performance characteristics in terms of bandwidth requirements and highlight key algorithmic considerations needed to implement the sort-first system. We characterize system performance using a distributed memory parallel volume rendering application, a nd present performance gains realized by using scene specific knowledge to accelerate rendering through reduced network bandwidth. The contribution of this work is an exploration of general-purpose, sort-first architecture performance characteristics as applied to distributed memory, commodity hardware, along with a description of the algorithmic support needed to realize parallel, sort-first implementations.

  9. Characterizing the convective velocity fields in massive stars

    SciTech Connect (OSTI)

    Chatzopoulos, Emmanouil; Graziani, Carlo; Couch, Sean M.

    2014-11-01

    We apply the mathematical formalism of vector spherical harmonics decomposition to convective stellar velocity fields from multidimensional hydrodynamics simulations and show that the resulting power spectra furnish a robust and stable statistical description of stellar convective turbulence. Analysis of the power spectra helps identify key physical parameters of the convective process such as the dominant scale of the turbulent motions that influence the structure of massive evolved pre-supernova stars. We introduce the numerical method that can be used to calculate vector spherical harmonics power spectra from two-dimensional (2D) and three-dimensional (3D) convective shell simulation data. Using this method we study the properties of oxygen shell burning and convection for a 15 M {sub ☉} star simulated by the hydrodynamics code FLASH in 2D and 3D. We discuss the importance of realistic initial conditions to achieving successful core-collapse supernova explosions in multidimensional simulations. We show that the calculated power spectra can be used to generate realizations of the velocity fields of presupernova convective shells. We find that the slope of the solenoidal mode power spectrum remains mostly constant throughout the evolution of convection in the oxygen shell in both 2D and 3D simulations. We also find that the characteristic radial scales of the convective elements are smaller in 3D than in 2D, while the angular scales are larger in 3D.

  10. Pair instability supernovae of very massive population III stars

    SciTech Connect (OSTI)

    Chen, Ke-Jung; Woosley, Stan; Heger, Alexander; Almgren, Ann; Whalen, Daniel J.

    2014-09-01

    Numerical studies of primordial star formation suggest that the first stars in the universe may have been very massive. Stellar models indicate that non-rotating Population III stars with initial masses of 140-260 M {sub ?} die as highly energetic pair-instability supernovae. We present new two-dimensional simulations of primordial pair-instability supernovae done with the CASTRO code. Our simulations begin at earlier times than previous multidimensional models, at the onset of core contraction, to capture any dynamical instabilities that may be seeded by core contraction and explosive burning. Such instabilities could enhance explosive yields by mixing hot ash with fuel, thereby accelerating nuclear burning, and affect the spectra of the supernova by dredging up heavy elements from greater depths in the star at early times. Our grid of models includes both blue supergiants and red supergiants over the range in progenitor mass expected for these events. We find that fluid instabilities driven by oxygen and helium burning arise at the upper and lower boundaries of the oxygen shell ?20-100 s after core bounce. Instabilities driven by burning freeze out after the SN shock exits the helium core. As the shock later propagates through the hydrogen envelope, a strong reverse shock forms that drives the growth of Rayleigh-Taylor instabilities. In red supergiant progenitors, the amplitudes of these instabilities are sufficient to mix the supernova ejecta.

  11. PROTOSTELLAR OUTFLOWS AND RADIATIVE FEEDBACK FROM MASSIVE STARS

    SciTech Connect (OSTI)

    Kuiper, Rolf; Yorke, Harold W.; Turner, Neal J. E-mail: Harold.W.Yorke@jpl.nasa.gov

    2015-02-20

    We carry out radiation hydrodynamical simulations of the formation of massive stars in the super-Eddington regime including both their radiative feedback and protostellar outflows. The calculations start from a prestellar core of dusty gas and continue until the star stops growing. The accretion ends when the remnants of the core are ejected, mostly by the force of the direct stellar radiation in the polar direction and elsewhere by the reradiated thermal infrared radiation. How long the accretion persists depends on whether the protostellar outflows are present. We set the mass outflow rate to 1% of the stellar sink particle's accretion rate. The outflows open a bipolar cavity extending to the core's outer edge, through which the thermal radiation readily escapes. The radiative flux is funneled into the polar directions while the core's collapse proceeds near the equator. The outflow thus extends the ''flashlight effect'', or anisotropic radiation field, found in previous studies from the few hundred AU scale of the circumstellar disk up to the 0.1 parsec scale of the core. The core's flashlight effect allows core gas to accrete on the disk for longer, in the same way that the disk's flashlight effect allows disk gas to accrete on the star for longer. Thus although the protostellar outflows remove material near the core's poles, causing slower stellar growth over the first few free-fall times, they also enable accretion to go on longer in our calculations. The outflows ultimately lead to stars of somewhat higher mass.

  12. HERSCHEL REVEALS MASSIVE COLD CLUMPS IN NGC 7538

    SciTech Connect (OSTI)

    Fallscheer, C.; Di Francesco, J.; Sadavoy, S.; Reid, M. A.; Martin, P. G.; Nguyen-Luong, Q.; Hill, T.; Hennemann, M.; Motte, F.; Men'shchikov, A.; Andre, Ph.; Konyves, V.; Sauvage, M.; Griffin, M.; Rygl, K. L. J.; Benedettini, M.; Schneider, N.; Anderson, L. D. [Laboratoire d'Astrophysique de Marseille, CNRS and others

    2013-08-20

    We present the first overview of the Herschel observations of the nearby high-mass star-forming region NGC 7538, taken as part of the Herschel imaging study of OB young stellar objects (HOBYS) Key Programme. These PACS and SPIRE maps cover an approximate area of one square degree at five submillimeter and far-infrared wavebands. We have identified 780 dense sources and classified 224 of those. With the intention of investigating the existence of cold massive starless or class 0-like clumps that would have the potential to form intermediate- to high-mass stars, we further isolate 13 clumps as the most likely candidates for follow-up studies. These 13 clumps have masses in excess of 40 M{sub Sun} and temperatures below 15 K. They range in size from 0.4 pc to 2.5 pc and have densities between 3 Multiplication-Sign 10{sup 3} cm{sup -3} and 4 Multiplication-Sign 10{sup 4} cm{sup -3}. Spectral energy distributions are then used to characterize their energetics and evolutionary state through a luminosity-mass diagram. NGC 7538 has a highly filamentary structure, previously unseen in the dust continuum of existing submillimeter surveys. We report the most complete imaging to date of a large, evacuated ring of material in NGC 7538 which is bordered by many cool sources.

  13. Scientists say climate change could cause a 'massive' tree die-off in

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    the U.S. Southwest Climate change could cause a 'massive' tree die-off in the U.S. Southwest Scientists say climate change could cause a 'massive' tree die-off in the U.S. Southwest In a troubling new study says a warming climate could trigger a "massive" dieoff of coniferous trees in the U.S. southwest sometime this century. December 24, 2015 Dying conifers, particularly ponderosa pine (Pinus ponderosa) and sugar pine (Pinus lambertiana) in California's Sequoia National Park,

  14. Monumental effort: How a dedicated team completed a massive beam-box

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    relocation for the NSTX upgrade | Princeton Plasma Physics Lab Monumental effort: How a dedicated team completed a massive beam-box relocation for the NSTX upgrade By John Greenwald December 8, 2014 Tweet Widget Google Plus One Share on Facebook An overhead crane lifts the massive box into the NSTX-U test cell (Photo by Mike Viola) An overhead crane lifts the massive box into the NSTX-U test cell Gallery: Overview of the NSTX-U test cell with the second neutral beam box installed at upper

  15. A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; Shephard, Mark S.

    2013-01-01

    Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order inmore » a 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less

  16. Verification of runaway migration in a massive disk

    SciTech Connect (OSTI)

    Li, Shengtai

    2009-01-01

    Runaway migration of a proto-planet was first proposed and observed by Masset and Papaloizou (2003). The semi-major axis of the proto-planet varies by 50% over just a few tens of orbits when runaway migration happens. More recent work by D'Angelo et al. (2005) solved the same problem with locally refined grid and found that the migration rate is sharply reduced and no runaway occurs when the grid cells surrounding the planet are refined enough. To verify these two seemly contradictory results, we independently perform high-resolution simulations, solving the same problem as Masset and Papaloizou (2003), with and without self-gravity. We find that the migration rate is highly dependent on the softening used in the gravitational force between thd disk and planet. When a small softening is used in a 2D massive disk, the mass of the circumplanetary disk (CPD) increases with time with enough resolution in the CPD region. It acts as the mass is continually accreted to the CPD, which cannot be settled down until after thousands of orbits. If the planet is held on a fixed orbit long enough, the mass of CPD will become so large that the condition for the runaway migration derived in Masset (2008) will not be satisfied, and hence the runaway migration will not be triggered. However, when a large softening is used, the mass of the CPD will begin to decrease after the initial increase stage. Our numerical results with and without disk-gravity confirm that the runaway migration indeed exists when the mass deficit is larger than the total mass of the planet and CPD. Our simulations results also show that the torque from the co-orbital region, in particular the planet's Hill sphere, is the main contributor to the runaway migration, and the CPD which is lagged behind by the planet becomes so asymmetric that it accelerates the migration.

  17. Progress on H5Part: A Portable High Performance Parallel DataInterface...

    Office of Scientific and Technical Information (OSTI)

    Performance Parallel DataInterface for Electromagnetics Simulations Citation Details In-Document Search Title: Progress on H5Part: A Portable High Performance Parallel ...

  18. Parallel halo finding in N-body cosmology simulations

    SciTech Connect (OSTI)

    Pfitzner, D.W.; Salmon, J.K.

    1996-12-31

    Cosmological N-body simulations on parallel computers produce large datasets - about five hundred Megabytes at a single output time, or tens of Gigabytes over the course of a simulation. These large datasets require further analysis before they can be compared to astronomical observations. We have implemented two methods for performing halo finding, a key part of the knowledge discovery process, on parallel machines. One of these is a parallel implementation of the friends of friends (FOF) algorithm, widely used in the field of N-body cosmology. The new isodensity (ID) method has been developed to overcome some of the shortcomings of FOR Both have been implemented on a variety of computer systems, and successfully used to extract halos from simulations with up to 256{sup 3} (or about 16.8 million) particles, which axe among the largest N-body cosmology simulations in existence.

  19. Parallel vacuum arc discharge with microhollow array dielectric and anode

    SciTech Connect (OSTI)

    Feng, Jinghua; Zhou, Lin; Fu, Yuecheng; Zhang, Jianhua; Xu, Rongkun; Chen, Faxin; Li, Linbo; Meng, Shijian

    2014-07-15

    An electrode configuration with microhollow array dielectric and anode was developed to obtain parallel vacuum arc discharge. Compared with the conventional electrodes, more than 10 parallel microhollow discharges were ignited for the new configuration, which increased the discharge area significantly and made the cathode eroded more uniformly. The vacuum discharge channel number could be increased effectively by decreasing the distances between holes or increasing the arc current. Experimental results revealed that plasmas ejected from the adjacent hollow and the relatively high arc voltage were two key factors leading to the parallel discharge. The characteristics of plasmas in the microhollow were investigated as well. The spectral line intensity and electron density of plasmas in microhollow increased obviously with the decease of the microhollow diameter.

  20. Parallel Scaling Characteristics of Selected NERSC User ProjectCodes

    SciTech Connect (OSTI)

    Skinner, David; Verdier, Francesca; Anand, Harsh; Carter,Jonathan; Durst, Mark; Gerber, Richard

    2005-03-05

    This report documents parallel scaling characteristics of NERSC user project codes between Fiscal Year 2003 and the first half of Fiscal Year 2004 (Oct 2002-March 2004). The codes analyzed cover 60% of all the CPU hours delivered during that time frame on seaborg, a 6080 CPU IBM SP and the largest parallel computer at NERSC. The scale in terms of concurrency and problem size of the workload is analyzed. Drawing on batch queue logs, performance data and feedback from researchers we detail the motivations, benefits, and challenges of implementing highly parallel scientific codes on current NERSC High Performance Computing systems. An evaluation and outlook of the NERSC workload for Allocation Year 2005 is presented.

  1. Parallel garbage collection on a virtual memory system

    SciTech Connect (OSTI)

    Abraham, S.G.; Patel, J.H.

    1987-01-01

    Since most artificial intelligence applications are programmed in list processing languages, it is important to design architectures to support efficient garbage collection. This paper presents an architecture and an associated algorithm for parallel garbage collection on a virtual memory system. All the previously proposed parallel algorithms attempt to collect cells released by the list processor during the garbage collection cycle. We do not attempt to collect such cells. As a consequence, the list processor incurs little overhead in the proposed scheme, since it need not synchronize with the collector. Most parallel algorithms are designed for shared memory machines which have certain implicit synchronization functions on variable access. The proposed algorithm is designed for virtual memory systems where both the list processor and the garbage collector have private memories. The enforcement of coherence between the two private memories can be expensive and is not necessary in our scheme. 15 refs., 3 figs.

  2. Analytic structure of the self-energy for massive gauge bosons...

    Office of Scientific and Technical Information (OSTI)

    bosons at finite temperature Citation Details In-Document Search Title: Analytic structure of the self-energy for massive gauge bosons at finite temperature We show that the ...

  3. Methods for operating parallel computing systems employing sequenced communications

    DOE Patents [OSTI]

    Benner, R.E.; Gustafson, J.L.; Montry, G.R.

    1999-08-10

    A parallel computing system and method are disclosed having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system. 15 figs.

  4. Small file aggregation in a parallel computing system

    DOE Patents [OSTI]

    Faibish, Sorin; Bent, John M.; Tzelnic, Percy; Grider, Gary; Zhang, Jingwang

    2014-09-02

    Techniques are provided for small file aggregation in a parallel computing system. An exemplary method for storing a plurality of files generated by a plurality of processes in a parallel computing system comprises aggregating the plurality of files into a single aggregated file; and generating metadata for the single aggregated file. The metadata comprises an offset and a length of each of the plurality of files in the single aggregated file. The metadata can be used to unpack one or more of the files from the single aggregated file.

  5. Methods for operating parallel computing systems employing sequenced communications

    DOE Patents [OSTI]

    Benner, Robert E. (Albuquerque, NM); Gustafson, John L. (Albuquerque, NM); Montry, Gary R. (Albuquerque, NM)

    1999-01-01

    A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.

  6. Global synchronization of parallel processors using clock pulse width modulation

    DOE Patents [OSTI]

    Chen, Dong; Ellavsky, Matthew R.; Franke, Ross L.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Jeanson, Mark J.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Littrell, Daniel; Ohmacht, Martin; Reed, Don D.; Schenck, Brandon E.; Swetz, Richard A.

    2013-04-02

    A circuit generates a global clock signal with a pulse width modification to synchronize processors in a parallel computing system. The circuit may include a hardware module and a clock splitter. The hardware module may generate a clock signal and performs a pulse width modification on the clock signal. The pulse width modification changes a pulse width within a clock period in the clock signal. The clock splitter may distribute the pulse width modified clock signal to a plurality of processors in the parallel computing system.

  7. New Cosmologies on the Horizon. Cosmology and Holography in bigravity and massive gravity

    SciTech Connect (OSTI)

    Tolley, Andrew James

    2013-03-31

    The goal of this research program is to explore the cosmological dynamics, the nature of cosmological and black hole horizons, and the role of holography in a new class of infrared modified theories of gravity. This will capitalize of the considerable recent progress in our understanding of the dynamics of massive spin two fields on curved spacetimes, culminating in the formulation of the first fully consistent theories of massive gravity and bigravity/bimetric theories.

  8. Massive Energy Storage in Superconductors (SMES) | U.S. DOE Office of

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Science (SC) Massive Energy Storage in Superconductors (SMES) High Energy Physics (HEP) HEP Home About Research Facilities Science Highlights Benefits of HEP Funding Opportunities Advisory Committees Community Resources Contact Information High Energy Physics U.S. Department of Energy SC-25/Germantown Building 1000 Independence Ave., SW Washington, DC 20585 P: (301) 903-3624 F: (301) 903-2597 E: Email Us More Information » 08.01.13 Massive Energy Storage in Superconductors (SMES) Novel high

  9. Massive Energy Storage in Superconductors (SMES) | U.S. DOE Office of

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Science (SC) Massive Energy Storage in Superconductors (SMES) Nuclear Physics (NP) NP Home About Research Facilities Science Highlights Benefits of NP Funding Opportunities Nuclear Science Advisory Committee (NSAC) Community Resources Contact Information Nuclear Physics U.S. Department of Energy SC-26/Germantown Building 1000 Independence Ave., SW Washington, DC 20585 P: (301) 903-3613 F: (301) 903-3833 E: Email Us More Information » 08.01.13 Massive Energy Storage in Superconductors (SMES)

  10. Massive-scale RDF Processing Using Compressed Bitmap Indexes

    SciTech Connect (OSTI)

    Madduri, Kamesh; Wu, Kesheng

    2011-05-26

    The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti#12;c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern-#12;nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e#14;ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This paper makes three new contributions. (i) We present an e#14;cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e#14;ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We #12;nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets.

  11. Building a Parallel Cloud Storage System using OpenStacks Swift Object Store and Transformative Parallel I/O

    SciTech Connect (OSTI)

    Burns, Andrew J.; Lora, Kaleb D.; Martinez, Esteban; Shorter, Martel L.

    2012-07-30

    Our project consists of bleeding-edge research into replacing the traditional storage archives with a parallel, cloud-based storage solution. It used OpenStack's Swift Object Store cloud software. It's Benchmarked Swift for write speed and scalability. Our project is unique because Swift is typically used for reads and we are mostly concerned with write speeds. Cloud Storage is a viable archive solution because: (1) Container management for larger parallel archives might ease the migration workload; (2) Many tools that are written for cloud storage could be utilized for local archive; and (3) Current large cloud storage practices in industry could be utilized to manage a scalable archive solution.

  12. Hardware packet pacing using a DMA in a parallel computer

    DOE Patents [OSTI]

    Chen, Dong; Heidelberger, Phillip; Vranas, Pavlos

    2013-08-13

    Method and system for hardware packet pacing using a direct memory access controller in a parallel computer which, in one aspect, keeps track of a total number of bytes put on the network as a result of a remote get operation, using a hardware token counter.

  13. An intercalation-locked parallel-stranded DNA tetraplex

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Tripathi, S.; Zhang, D.; Paukstelis, P. J.

    2015-01-27

    DNA has proved to be an excellent material for nanoscale construction because complementary DNA duplexes are programmable and structurally predictable. However, in the absence of Watson–Crick pairings, DNA can be structurally more diverse. Here, we describe the crystal structures of d(ACTCGGATGAT) and the brominated derivative, d(ACBrUCGGABrUGAT). These oligonucleotides form parallel-stranded duplexes with a crystallographically equivalent strand, resulting in the first examples of DNA crystal structures that contains four different symmetric homo base pairs. Two of the parallel-stranded duplexes are coaxially stacked in opposite directions and locked together to form a tetraplex through intercalation of the 5'-most A–A base pairs betweenmore » adjacent G–G pairs in the partner duplex. The intercalation region is a new type of DNA tertiary structural motif with similarities to the i-motif. 1H–1H nuclear magnetic resonance and native gel electrophoresis confirmed the formation of a parallel-stranded duplex in solution. Finally, we modified specific nucleotide positions and added d(GAY) motifs to oligonucleotides and were readily able to obtain similar crystals. This suggests that this parallel-stranded DNA structure may be useful in the rational design of DNA crystals and nanostructures.« less

  14. Parallel heat transport in integrable and chaotic magnetic fields

    SciTech Connect (OSTI)

    Del-Castillo-Negrete, Diego B [ORNL; Chacon, Luis [ORNL

    2012-01-01

    The study of transport in magnetized plasmas is a problem of fundamental interest in controlled fusion, space plasmas, and astrophysics research. Three issues make this problem particularly chal- lenging: (i) The extreme anisotropy between the parallel (i.e., along the magnetic field), , and the perpendicular, , conductivities ( / may exceed 1010 in fusion plasmas); (ii) Magnetic field lines chaos which in general complicates (and may preclude) the construction of magnetic field line coordinates; and (iii) Nonlocal parallel transport in the limit of small collisionality. Motivated by these issues, we present a Lagrangian Green s function method to solve the local and non-local parallel transport equation applicable to integrable and chaotic magnetic fields in arbitrary geom- etry. The method avoids by construction the numerical pollution issues of grid-based algorithms. The potential of the approach is demonstrated with nontrivial applications to integrable (magnetic island chain), weakly chaotic (devil s staircase), and fully chaotic magnetic field configurations. For the latter, numerical solutions of the parallel heat transport equation show that the effective radial transport, with local and non-local closures, is non-diffusive, thus casting doubts on the appropriateness of the applicability of quasilinear diffusion descriptions. General conditions for the existence of non-diffusive, multivalued flux-gradient relations in the temperature evolution are derived.

  15. Xyce Parallel Electronic Simulator Users Guide Version 6.2.

    SciTech Connect (OSTI)

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2014-09-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce%40sandia.gov (outside Sandia) xyce-sandia%40sandia.gov (Sandia only)

  16. The MASSIVE survey. I. A volume-limited integral-field spectroscopic study of the most massive early-type galaxies within 108 Mpc

    SciTech Connect (OSTI)

    Ma, Chung-Pei [Department of Astronomy, University of California, Berkeley, CA 94720 (United States); Greene, Jenny E.; Murphy, Jeremy D. [Department of Astrophysical Sciences, Princeton University, Princeton, NJ 08544 (United States); McConnell, Nicholas [Institute for Astronomy, University of Hawaii at Manoa, Honolulu, HI 96822 (United States); Janish, Ryan [Department of Physics, University of California, Berkeley, CA 94720 (United States); Blakeslee, John P. [Dominion Astrophysical Observatory, NRC Herzberg Institute of Astrophysics, Victoria, BC V9E 2E7 (Canada); Thomas, Jens, E-mail: cpma@berkeley.edu [Max Planck-Institute for Extraterrestrial Physics, Giessenbachstr. 1, D-85741 Garching (Germany)

    2014-11-10

    Massive early-type galaxies represent the modern day remnants of the earliest major star formation episodes in the history of the universe. These galaxies are central to our understanding of the evolution of cosmic structure, stellar populations, and supermassive black holes, but the details of their complex formation histories remain uncertain. To address this situation, we have initiated the MASSIVE Survey, a volume-limited, multi-wavelength, integral-field spectroscopic (IFS) and photometric survey of the structure and dynamics of the ?100 most massive early-type galaxies within a distance of 108 Mpc. This survey probes a stellar mass range M* ? 10{sup 11.5} M {sub ?} and diverse galaxy environments that have not been systematically studied to date. Our wide-field IFS data cover about two effective radii of individual galaxies, and for a subset of them, we are acquiring additional IFS observations on sub-arcsecond scales with adaptive optics. We are also acquiring deep K-band imaging to trace the extended halos of the galaxies and measure accurate total magnitudes. Dynamical orbit modeling of the combined data will allow us to simultaneously determine the stellar, black hole, and dark matter halo masses. The primary goals of the project are to constrain the black hole scaling relations at high masses, investigate systematically the stellar initial mass function and dark matter distribution in massive galaxies, and probe the late-time assembly of ellipticals through stellar population and kinematical gradients. In this paper, we describe the MASSIVE sample selection, discuss the distinct demographics and structural and environmental properties of the selected galaxies, and provide an overview of our basic observational program, science goals and early survey results.

  17. Scalable parallel solution coupling for multi-physics reactor simulation.

    SciTech Connect (OSTI)

    Tautges, T. J.; Caceres, A.; Mathematics and Computer Science

    2009-01-01

    Reactor simulation depends on the coupled solution of various physics types, including neutronics, thermal/hydraulics, and structural mechanics. This paper describes the formulation and implementation of a parallel solution coupling capability being developed for reactor simulation. The coupling process consists of mesh and coupler initialization, point location, field interpolation, and field normalization. We report here our test of this capability on an example problem, namely, a reflector assembly from an advanced burner test reactor. Performance of this coupler in parallel is reasonable for the chosen problem size and range of processor counts. The runtime is dominated by startup costs, which amortize over the entire coupled simulation. Future efforts will include adding more sophisticated interpolation and normalization methods, to accommodate different numerical solvers used in various physics modules and to obtain better conservation properties for certain field types.

  18. Parallel resistivity and ohmic heating of laboratory dipole plasmas

    SciTech Connect (OSTI)

    Fox, W.

    2012-08-15

    The parallel resistivity is calculated in the long-mean-free-path regime for the dipole plasma geometry; this is shown to be a neoclassical transport problem in the limit of a small number of circulating electrons. In this regime, the resistivity is substantially higher than the Spitzer resistivity due to the magnetic trapping of a majority of the electrons. This suggests that heating the outer flux surfaces of the plasma with low-frequency parallel electric fields can be substantially more efficient than might be naively estimated. Such a skin-current heating scheme is analyzed by deriving an equation for diffusion of skin currents into the plasma, from which quantities such as the resistive skin-depth, lumped-circuit impedance, and power deposited in the plasma can be estimated. Numerical estimates indicate that this may be a simple and efficient way to couple power into experiments in this geometry.

  19. Administering truncated receive functions in a parallel messaging interface

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-12-09

    Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.

  20. Final Report: Center for Programming Models for Scalable Parallel Computing

    SciTech Connect (OSTI)

    Mellor-Crummey, John

    2011-09-13

    As part of the Center for Programming Models for Scalable Parallel Computing, Rice University collaborated with project partners in the design, development and deployment of language, compiler, and runtime support for parallel programming models to support application development for the leadership-class computer systems at DOE national laboratories. Work over the course of this project has focused on the design, implementation, and evaluation of a second-generation version of Coarray Fortran. Research and development efforts of the project have focused on the CAF 2.0 language, compiler, runtime system, and supporting infrastructure. This has involved working with the teams that provide infrastructure for CAF that we rely on, implementing new language and runtime features, producing an open source compiler that enabled us to evaluate our ideas, and evaluating our design and implementation through the use of benchmarks. The report details the research, development, findings, and conclusions from this work.

  1. Performing a local reduction operation on a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A; Faraj, Daniel A

    2013-06-04

    A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.

  2. Parallel Algorithms for Graph Optimization using Tree Decompositions

    SciTech Connect (OSTI)

    Sullivan, Blair D; Weerapurage, Dinesh P; Groer, Christopher S

    2012-06-01

    Although many $\\cal{NP}$-hard graph optimization problems can be solved in polynomial time on graphs of bounded tree-width, the adoption of these techniques into mainstream scientific computation has been limited due to the high memory requirements of the necessary dynamic programming tables and excessive runtimes of sequential implementations. This work addresses both challenges by proposing a set of new parallel algorithms for all steps of a tree decomposition-based approach to solve the maximum weighted independent set problem. A hybrid OpenMP/MPI implementation includes a highly scalable parallel dynamic programming algorithm leveraging the MADNESS task-based runtime, and computational results demonstrate scaling. This work enables a significant expansion of the scale of graphs on which exact solutions to maximum weighted independent set can be obtained, and forms a framework for solving additional graph optimization problems with similar techniques.

  3. Data-Parallel Mesh Connected Components Labeling and Analysis

    SciTech Connect (OSTI)

    Harrison, Cyrus; Childs, Hank; Gaither, Kelly

    2011-04-10

    We present a data-parallel algorithm for identifying and labeling the connected sub-meshes within a domain-decomposed 3D mesh. The identification task is challenging in a distributed-memory parallel setting because connectivity is transitive and the cells composing each sub-mesh may span many or all processors. Our algorithm employs a multi-stage application of the Union-find algorithm and a spatial partitioning scheme to efficiently merge information across processors and produce a global labeling of connected sub-meshes. Marking each vertex with its corresponding sub-mesh label allows us to isolate mesh features based on topology, enabling new analysis capabilities. We briefly discuss two specific applications of the algorithm and present results from a weak scaling study. We demonstrate the algorithm at concurrency levels up to 2197 cores and analyze meshes containing up to 68 billion cells.

  4. Performing a local reduction operation on a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A.; Faraj, Daniel A.

    2012-12-11

    A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.

  5. Methodology for Augmenting Existing Paths with Additional Parallel Transects

    SciTech Connect (OSTI)

    Wilson, John E.

    2013-09-30

    Visual Sample Plan (VSP) is sample planning software that is used, among other purposes, to plan transect sampling paths to detect areas that were potentially used for munition training. This module was developed for application on a large site where existing roads and trails were to be used as primary sampling paths. Gap areas between these primary paths needed to found and covered with parallel transect paths. These gap areas represent areas on the site that are more than a specified distance from a primary path. These added parallel paths needed to optionally be connected together into a single paththe shortest path possible. The paths also needed to optionally be attached to existing primary paths, again with the shortest possible path. Finally, the process must be repeatable and predictable so that the same inputs (primary paths, specified distance, and path options) will result in the same set of new paths every time. This methodology was developed to meet those specifications.

  6. Ultrafast stimulated Raman parallel adiabatic passage by shaped pulses

    SciTech Connect (OSTI)

    Dridi, G.; Guerin, S.; Hakobyan, V.; Jauslin, H. R.; Eleuch, H.

    2009-10-15

    We present a general and versatile technique of population transfer based on parallel adiabatic passage by femtosecond shaped pulses. Their amplitude and phase are specifically designed to optimize the adiabatic passage corresponding to parallel eigenvalues at all times. We show that this technique allows the robust adiabatic population transfer in a Raman system with the total pulse area as low as 3{pi}, corresponding to a fluence of one order of magnitude below the conventional stimulated Raman adiabatic passage process. This process of short duration, typically picosecond and subpicosecond, is easily implementable with the modern pulse shaper technology and opens the possibility of ultrafast robust population transfer with interesting applications in quantum information processing.

  7. Laser Safety Method For Duplex Open Loop Parallel Optical Link

    DOE Patents [OSTI]

    Baumgartner, Steven John (Zumbro Falls, MN); Hedin, Daniel Scott (Rochester, MN); Paschal, Matthew James (Rochester, MN)

    2003-12-02

    A method and apparatus are provided to ensure that laser optical power does not exceed a "safe" level in an open loop parallel optical link in the event that a fiber optic ribbon cable is broken or otherwise severed. A duplex parallel optical link includes a transmitter and receiver pair and a fiber optic ribbon that includes a designated number of channels that cannot be split. The duplex transceiver includes a corresponding transmitter and receiver that are physically attached to each other and cannot be detached therefrom, so as to ensure safe, laser optical power in the event that the fiber optic ribbon cable is broken or severed. Safe optical power is ensured by redundant current and voltage safety checks.

  8. Buffered coscheduling for parallel programming and enhanced fault tolerance

    DOE Patents [OSTI]

    Petrini, Fabrizio (Los Alamos, NM); Feng, Wu-chun (Los Alamos, NM)

    2006-01-31

    A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

  9. Coupled Serial and Parallel Non-uniform SQUIDs

    SciTech Connect (OSTI)

    Longhini, Patrick; In, Visarath; Berggren, Susan; Palacios, Antonio; Leese de Escobar, Anna

    2011-04-19

    In this work we numerical model series and parallel non-uniform superconducting quantum interference device (SQUID) array. Previous work has shown that series SQUID array constructed with a random distribution of loop sizes, (i.e. different areas for each SQUID loop) there exists a unique 'anti-peak' at the zero magnetic field for the voltage versus applied magnetic field (V-B). Similar results extend to a parallel SQUID array where the difference lies in the arrangement of the Josephson junctions. Other system parameter such as bias current, the number of loops, and mutual inductances are varied to demonstrate the change in dynamic range and linearity of the V-B response. Application of the SQUID array as a low noise amplifier (LNA) would increase link margins and affect the entire communication system. For unmanned aerial vehicles (UAVs), size, weight and power are limited, the SQUID array would allow use of practical 'electrically small' antennas that provide acceptable gain.

  10. Beam Dynamics Studies of Parallel-Bar Deflecting Cavities

    SciTech Connect (OSTI)

    S. Ahmed, G. Krafft, K. Detrick, S. Silva, J. Delayen, M. Spata ,M. Tiefenback, A. Hofler ,K. Beard

    2011-03-01

    We have performed three-dimensional simulations of beam dynamics for parallel-bar transverse electromagnetic mode (TEM) type RF separators: normal- and super-conducting. The compact size of these cavities as compared to conventional TM$_{110}$ type structures is more attractive particularly at low frequency. Highly concentrated electromagnetic fields between the parallel bars provide strong electrical stability to the beam for any mechanical disturbance. An array of six 2-cell normal conducting cavities or a one- or two-cell superconducting structure are enough to produce the required vertical displacement at the Lambertson magnet. Both the normal and super-conducting structures show very small emittance dilution due to the vertical kick of the beam.

  11. Performance evaluation of a parallel sparse lattice Boltzmann solver

    SciTech Connect (OSTI)

    Axner, L. Bernsdorf, J. Zeiser, T. Lammers, P. Linxweiler, J. Hoekstra, A.G.

    2008-05-01

    We develop a performance prediction model for a parallelized sparse lattice Boltzmann solver and present performance results for simulations of flow in a variety of complex geometries. A special focus is on partitioning and memory/load balancing strategy for geometries with a high solid fraction and/or complex topology such as porous media, fissured rocks and geometries from medical applications. The topology of the lattice nodes representing the fluid fraction of the computational domain is mapped on a graph. Graph decomposition is performed with both multilevel recursive-bisection and multilevel k-way schemes based on modified Kernighan-Lin and Fiduccia-Mattheyses partitioning algorithms. Performance results and optimization strategies are presented for a variety of platforms, showing a parallel efficiency of almost 80% for the largest problem size. A good agreement between the performance model and experimental results is demonstrated.

  12. Xyce Parallel Electronic Simulator : reference guide, version 4.1.

    SciTech Connect (OSTI)

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  13. Parallel and Antiparallel Interfacial Coupling in AF-FM Bilayers

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel and Antiparallel Interfacial Coupling in AF-FM Bilayers Print Cooling an antiferromagnetic-ferromagnetic bilayer in a magnetic field typically results in a remanent (zero-field) magnetization in the ferromagnet (FM) that is always in the direction of the field during cooling (positive Mrem). Strikingly, when FeF2 is the antiferromagnet (AF), cooling in a field can lead to a remanent magnetization opposite to the field (negative Mrem). A collaboration led by researchers from the Stanford

  14. Parallel and Antiparallel Interfacial Coupling in AF-FM Bilayers

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel and Antiparallel Interfacial Coupling in AF-FM Bilayers Print Cooling an antiferromagnetic-ferromagnetic bilayer in a magnetic field typically results in a remanent (zero-field) magnetization in the ferromagnet (FM) that is always in the direction of the field during cooling (positive Mrem). Strikingly, when FeF2 is the antiferromagnet (AF), cooling in a field can lead to a remanent magnetization opposite to the field (negative Mrem). A collaboration led by researchers from the Stanford

  15. Parallel and Antiparallel Interfacial Coupling in AF-FM Bilayers

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel and Antiparallel Interfacial Coupling in AF-FM Bilayers Print Cooling an antiferromagnetic-ferromagnetic bilayer in a magnetic field typically results in a remanent (zero-field) magnetization in the ferromagnet (FM) that is always in the direction of the field during cooling (positive Mrem). Strikingly, when FeF2 is the antiferromagnet (AF), cooling in a field can lead to a remanent magnetization opposite to the field (negative Mrem). A collaboration led by researchers from the Stanford

  16. CHICO2 - A pixelated parallel-plate avalanche counter

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    CHICO2 - A pixelated parallel-plate avalanche counter CHICO2 is an upgrade version of CHICO (Compact Heavy Ion COunter) with improved position resolution. CHICO was designed specifically for Gammasphere as auxiliary detector for the charged-particle detection. The design and fabrication work were carried out at University of Rochester under NSF funding. A total of 26 Gammasphere/CHICO experiments were successfully fielded between 1996 and 2008 using various experimental techniques such as the

  17. Parallel Element Agglomeration Algebraic Multigrid and Upscaling Library

    Energy Science and Technology Software Center (OSTI)

    2015-02-19

    ParFELAG is a parallel distributed memory C++ library for numerical upscaling of finite element discretizations. It provides optimal complesity algorithms ro build multilevel hierarchies and solvers that can be used for solving a wide class of partial differential equations (elliptic, hyperbolic, saddle point problems) on general unstructured mesh (under the assumption that the topology of the agglomerated entities is correct). Additionally, a novel multilevel solver for saddle point problems with divergence constraint is implemented.

  18. Electronically commutated serial-parallel switching for motor windings

    DOE Patents [OSTI]

    Hsu, John S. (Oak Ridge, TN)

    2012-03-27

    A method and a circuit for controlling an ac machine comprises controlling a full bridge network of commutation switches which are connected between a multiphase voltage source and the phase windings to switch the phase windings between a parallel connection and a series connection while providing commutation discharge paths for electrical current resulting from inductance in the phase windings. This provides extra torque for starting a vehicle from lower battery current.

  19. Multithreaded processor architecture for parallel symbolic computation. Technical report

    SciTech Connect (OSTI)

    Fujita, T.

    1987-09-01

    This paper describes the Multilisp Architecture for Symbolic Applications (MASA), which is a multithreaded processor architecture for parallel symbolic computation with various features intended for effective Multilisp program execution. The principal mechanisms exploited for this processor are multiple contexts, interleaved pipeline execution from separate instruction streams, and synchronization based on a bit in each memory cell. The tagged architecture approach is taken for Lisp program execution, and trap conditions are provided for future object manipulation and garbage collection.

  20. A Parallel Stochastic Framework for Reservoir Characterization and History Matching

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Thomas, Sunil G.; Klie, Hector M.; Rodriguez, Adolfo A.; Wheeler, Mary F.

    2011-01-01

    The spatial distribution of parameters that characterize the subsurface is never known to any reasonable level of accuracy required to solve the governing PDEs of multiphase flow or species transport through porous media. This paper presents a numerically cheap, yet efficient, accurate and parallel framework to estimate reservoir parameters, for example, medium permeability, using sensor information from measurements of the solution variables such as phase pressures, phase concentrations, fluxes, and seismic and well log data. Numerical results are presented to demonstrate the method.

  1. Xyce parallel electronic simulator reference guide, version 6.0.

    SciTech Connect (OSTI)

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.

    2013-08-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1].

  2. Xyce parallel electronic simulator reference guide, version 6.1

    SciTech Connect (OSTI)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  3. Identifying failure in a tree network of a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J.; Pinnow, Kurt W.; Wallenfelt, Brian P.

    2010-08-24

    Methods, parallel computers, and products are provided for identifying failure in a tree network of a parallel computer. The parallel computer includes one or more processing sets including an I/O node and a plurality of compute nodes. For each processing set embodiments include selecting a set of test compute nodes, the test compute nodes being a subset of the compute nodes of the processing set; measuring the performance of the I/O node of the processing set; measuring the performance of the selected set of test compute nodes; calculating a current test value in dependence upon the measured performance of the I/O node of the processing set, the measured performance of the set of test compute nodes, and a predetermined value for I/O node performance; and comparing the current test value with a predetermined tree performance threshold. If the current test value is below the predetermined tree performance threshold, embodiments include selecting another set of test compute nodes. If the current test value is not below the predetermined tree performance threshold, embodiments include selecting from the test compute nodes one or more potential problem nodes and testing individually potential problem nodes and links to potential problem nodes.

  4. Generating unstructured nuclear reactor core meshes in parallel

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Jain, Rajeev; Tautges, Timothy J.

    2014-10-24

    Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less

  5. PArallel Reacting Multiphase FLOw Computational Fluid Dynamic Analysis

    Energy Science and Technology Software Center (OSTI)

    2002-06-01

    PARMFLO is a parallel multiphase reacting flow computational fluid dynamics (CFD) code. It can perform steady or unsteady simulations in three space dimensions. It is intended for use in engineering CFD analysis of industrial flow system components. Its parallel processing capabilities allow it to be applied to problems that use at least an order of magnitude more computational cells than the number that can be used on a typical single processor workstation (about 106 cellsmore » in parallel processing mode versus about io cells in serial processing mode). Alternately, by spreading the work of a CFD problem that could be run on a single workstation over a group of computers on a network, it can bring the runtime down by an order of magnitude or more (typically from many days to less than one day). The software was implemented using the industry standard Message-Passing Interface (MPI) and domain decomposition in one spatial direction. The phases of a flow problem may include an ideal gas mixture with an arbitrary number of chemical species, and dispersed droplet and particle phases. Regions of porous media may also be included within the domain. The porous media may be packed beds, foams, or monolith catalyst supports. With these features, the code is especially suited to analysis of mixing of reactants in the inlet chamber of catalytic reactors coupled to computation of product yields that result from the flow of the mixture through the catalyst coaled support structure.« less

  6. Composing Data Parallel Code for a SPARQL Graph Engine

    SciTech Connect (OSTI)

    Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste; Haglin, David J.; Feo, John

    2013-09-08

    Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basic graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.

  7. Improved parallel solution techniques for the integral transport matrix method

    SciTech Connect (OSTI)

    Zerr, Robert J; Azmy, Yousry Y

    2010-11-23

    Alternative solution strategies to the parallel block Jacobi (PBJ) method for the solution of the global problem with the integral transport matrix method operators have been designed and tested. The most straightforward improvement to the Jacobi iterative method is the Gauss-Seidel alternative. The parallel red-black Gauss-Seidel (PGS) algorithm can improve on the number of iterations and reduce work per iteration by applying an alternating red-black color-set to the subdomains and assigning multiple sub-domains per processor. A parallel GMRES(m) method was implemented as an alternative to stationary iterations. Computational results show that the PGS method can improve on the PBJ method execution by up to {approx}50% when eight sub-domains per processor are used. However, compared to traditional source iterations with diffusion synthetic acceleration, it is still approximately an order of magnitude slower. The best-performing case are opticaUy thick because sub-domains decouple, yielding faster convergence. Further tests revealed that 64 sub-domains per processor was the best performing level of sub-domain division. An acceleration technique that improves the convergence rate would greatly improve the ITMM. The GMRES(m) method with a diagonal block preconditioner consumes approximately the same time as the PBJ solver but could be improved by an as yet undeveloped, more efficient preconditioner.

  8. Automatic Thread-Level Parallelization in the Chombo AMR Library

    SciTech Connect (OSTI)

    Christen, Matthias; Keen, Noel; Ligocki, Terry; Oliker, Leonid; Shalf, John; Van Straalen, Brian; Williams, Samuel

    2011-05-26

    The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number of existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.

  9. Design and performance of a scalable, parallel statistics toolkit.

    SciTech Connect (OSTI)

    Thompson, David C.; Bennett, Janine Camille; Pebay, Philippe Pierre

    2010-11-01

    Most statistical software packages implement a broad range of techniques but do so in an ad hoc fashion, leaving users who do not have a broad knowledge of statistics at a disadvantage since they may not understand all the implications of a given analysis or how to test the validity of results. These packages are also largely serial in nature, or target multicore architectures instead of distributed-memory systems, or provide only a small number of statistics in parallel. This paper surveys a collection of parallel implementations of statistics algorithm developed as part of a common framework over the last 3 years. The framework strategically groups modeling techniques with associated verification and validation techniques to make the underlying assumptions of the statistics more clear. Furthermore it employs a design pattern specifically targeted for distributed-memory parallelism, where architectural advances in large-scale high-performance computing have been focused. Moment-based statistics (which include descriptive, correlative, and multicorrelative statistics, principal component analysis (PCA), and k-means statistics) scale nearly linearly with the data set size and number of processes. Entropy-based statistics (which include order and contingency statistics) do not scale well when the data in question is continuous or quasi-diffuse but do scale well when the data is discrete and compact. We confirm and extend our earlier results by now establishing near-optimal scalability with up to 10,000 processes.

  10. Generating unstructured nuclear reactor core meshes in parallel

    SciTech Connect (OSTI)

    Jain, Rajeev; Tautges, Timothy J.

    2014-10-24

    Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor core examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.

  11. Parallel Computing Environments and Methods for Power Distribution System Simulation

    SciTech Connect (OSTI)

    Lu, Ning; Taylor, Zachary T.; Chassin, David P.; Guttromson, Ross T.; Studham, Scott S.

    2005-11-10

    The development of cost-effective high-performance parallel computing on multi-processor super computers makes it attractive to port excessively time consuming simulation software from personal computers (PC) to super computes. The power distribution system simulator (PDSS) takes a bottom-up approach and simulates load at appliance level, where detailed thermal models for appliances are used. This approach works well for a small power distribution system consisting of a few thousand appliances. When the number of appliances increases, the simulation uses up the PC memory and its run time increases to a point where the approach is no longer feasible to model a practical large power distribution system. This paper presents an effort made to port a PC-based power distribution system simulator (PDSS) to a 128-processor shared-memory super computer. The paper offers an overview of the parallel computing environment and a description of the modification made to the PDSS model. The performances of the PDSS running on a standalone PC and on the super computer are compared. Future research direction of utilizing parallel computing in the power distribution system simulation is also addressed.

  12. Parallel garbage collection without synchronization overhead. Technical report

    SciTech Connect (OSTI)

    Patel, J.H.

    1984-08-01

    Incremental garbage-collection schemes incur substantial overhead that is directly translated as reduced execution efficiency for the user. Parallel garbage-collection schemes implemented via time-slicing on a serial processor also incur this overhead, which might even be aggravated due to context switching. It is useful, therefore, to examine the possibility of implementing a parallel garbage-collection algorithm using a separate processor operating asynchronously with the main-list processor. The overhead in such a scheme arises from the synchronization necessary to manage the two processors, maintaining memory consistency. In this paper, the authors present an architecture and supporting parallel garbage-collection algorithms designed for a virtual memory system with separate processors for list processing and for garbage collection. Each processor has its own primary memory; in addition, there is a small common memory which both processors may access. Individual memories swap off a common secondary memory, but no locking mechanism is required. In particular, a page may reside in both memories simultaneously, and indeed may be accessed and modified freely by each processor. A secondary memory controller ensures consistency without necessitating numerous lockouts on the pages.

  13. Primordial massive gravitational waves from Einstein-Chern-Simons-Weyl gravity

    SciTech Connect (OSTI)

    Myung, Yun Soo; Moon, Taeyoon E-mail: tymoon@inje.ac.kr

    2014-08-01

    We investigate the evolution of cosmological perturbations during de Sitter inflation in the Einstein-Chern-Simons-Weyl gravity. Primordial massive gravitational waves are composed of one scalar, two vector and four tensor circularly polarized modes. We show that the vector power spectrum decays quickly like a transversely massive vector in the superhorizon limit z?0. In this limit, the power spectrum coming from massive tensor modes decays quickly, leading to the conventional tensor power spectrum. Also, we find that in the limit of m{sup 2}?0 (keeping the Weyl-squared term only), the vector and tensor power spectra disappear. It implies that their power spectra are not gravitationally produced because they (vector and tensor) are decoupled from the expanding de Sitter background, as a result of conformal invariance.

  14. Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-07-14

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  15. Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOE Patents [OSTI]

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-07-07

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  16. IRDC G030.88+00.13: A TALE OF TWO MASSIVE CLUMPS

    SciTech Connect (OSTI)

    Zhang Qizhou; Wang Ke, E-mail: qzhang@cfa.harvard.edu [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States)

    2011-05-20

    Massive stars (M {approx}>10 M{sub sun}) form from collapse of parsec-scale molecular clumps. How molecular clumps fragment to give rise to massive stars in a cluster with a distribution of masses is unclear. We search for cold cores that may lead to future formation of massive stars in a massive (>10{sup 3} M{sub sun}), low luminosity (4.6 x 10{sup 2} L{sub sun}) infrared dark cloud (IRDC) G030.88+00.13. The NH{sub 3} data from the Very Large Array (VLA) and Green Bank Telescope reveal that the extinction feature seen in the infrared consists of two distinctive clumps along the same line of sight. The C1 clump at 97 km s{sup -1} coincides with the extinction in the Spitzer 8 and 24 {mu}m. Therefore, it is responsible for the majority of the IRDC. The C2 clump at 107 km s{sup -1} is more compact and has a peak temperature of 45 K. Compact dust cores and H{sub 2}O masers revealed in the Submillimeter Array and VLA observations are mostly associated with C2, and none are within the IRDC in C1. The luminosity indicates that neither the C1 nor C2 clump has yet to form massive protostars. But C1 might be at a precluster forming stage. The simulated observations rule out 0.1 pc cold cores with masses above 8 M{sub sun} within the IRDC. The core masses in C1 and C2 and those in high-mass protostellar objects suggest an evolutionary trend that the mass of cold cores increases over time. Based on our findings, we propose an empirical picture of massive star formation that protostellar cores and the embedded protostars undergo simultaneous mass growth during the protostellar evolution.

  17. Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.

    SciTech Connect (OSTI)

    Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.

    2005-06-01

    This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to meet the unique needs of the laboratory.4 XyceTMUsers' GuideAcknowledgementsThe authors would like to acknowledge the entire Sandia National Laboratories HPEMS(High Performance Electrical Modeling and Simulation) team, including Steve Wix, CarolynBogdan, Regina Schells, Ken Marx, Steve Brandon and Bill Ballard, for their support onthis project. We also appreciate very much the work of Jim Emery, Becky Arnold and MikeWilliamson for the help in reviewing this document.Lastly, a very special thanks to Hue Lai for typesetting this document with LATEX.TrademarksThe information herein is subject to change without notice.Copyrightc 2002-2003 Sandia Corporation. All rights reserved.XyceTMElectronic Simulator andXyceTMtrademarks of Sandia Corporation.Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence DesignSystems, Inc.Silicon Graphics, the Silicon Graphics logo and IRIX are registered trademarks of SiliconGraphics, Inc.Microsoft, Windows and Windows 2000 are registered trademark of Microsoft Corporation.Solaris and UltraSPARC are registered trademarks of Sun Microsystems Corporation.Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation.HP and Alpha are registered trademarks of Hewlett-Packard company.Amtec and TecPlot are trademarks of Amtec Engineering, Inc.Xyce's expression library is based on that inside Spice 3F5 developed by the EECS De-partment at the University of California.All other trademarks are property of their respective owners.ContactsBug Reportshttp://tvrusso.sandia.gov/bugzillaEmailxyce-support%40sandia.govWorld Wide Webhttp://www.cs.sandia.gov/xyce5 XyceTMUsers' GuideThis page is left intentionally blank6

  18. Xyce Parallel Electronic Simulator Users Guide Version 6.4

    SciTech Connect (OSTI)

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2015-12-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)

  19. Analytic structure of the self-energy for massive gauge bosons at finite

    Office of Scientific and Technical Information (OSTI)

    temperature (Journal Article) | SciTech Connect Analytic structure of the self-energy for massive gauge bosons at finite temperature Citation Details In-Document Search Title: Analytic structure of the self-energy for massive gauge bosons at finite temperature We show that the one-loop self-energy at finite temperature has a unique limit as the external momentum [ital p][sub [mu]][r arrow]0 [ital if] the loop involves propagators with distinct masses. This naturally arises in theories

  20. The Death of a Massive Star Holds Key to Early Universe | U.S. DOE Office

    Office of Science (SC) Website

    of Science (SC) The Death of a Massive Star Holds Key to Early Universe News News Home Featured Articles 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 Science Headlines Science Highlights Presentations & Testimony News Archives Communications and Public Affairs Contact Information Office of Science U.S. Department of Energy 1000 Independence Ave., SW Washington, DC 20585 P: (202) 586-5430 12.16.09 The Death of a Massive Star Holds Key to Early Universe Scientists found the

  1. New Mariners and a Massive Map: Berkeley Computers Calculate What's in the

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    Sky | Department of Energy Mariners and a Massive Map: Berkeley Computers Calculate What's in the Sky New Mariners and a Massive Map: Berkeley Computers Calculate What's in the Sky February 2, 2012 - 12:08pm Addthis This is the Southern Galactic Cap view as recorded by the Sloan Digital Sky Survey. A 2.5-meter telescope at Apache Point Observatory in New Mexico took in light from over a third of the total area of the sky (14,000 square degrees) including 1.5 million galaxies. | Photo

  2. Parallel I/O Software Infrastructure for Large-Scale Systems

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    Parallel IO Software Infrastructure for Large-Scale Systems Parallel IO Software Infrastructure for Large-Scale Systems Choudhary.png An illustration of how MPI---IO file domain...

  3. cray-hdf5-parallel/1.8.13 garbling integers in intel environment

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    cray-hdf5-parallel1.8.13 garbling integers in intel environment cray-hdf5-parallel1.8.13 garbling integers in intel environment September 11, 2014 This problem was fixed on 11...

  4. Parallel optics technology assessment for the versatile link project

    SciTech Connect (OSTI)

    Chramowicz, J.; Kwan, S.; Rivera, R.; Prosser, A.; /Fermilab

    2011-01-01

    This poster describes the assessment of commercially available and prototype parallel optics modules for possible use as back end components for the Versatile Link common project. The assessment covers SNAP12 transmitter and receiver modules as well as optical engine technologies in dense packaging options. Tests were performed using vendor evaluation boards (SNAP12) as well as custom evaluation boards (optical engines). The measurements obtained were used to compare the performance of these components with single channel SFP+ components operating at a transmission wavelength of 850 nm over multimode fibers.

  5. Computing NLTE Opacities -- Node Level Parallel Calculation

    SciTech Connect (OSTI)

    Holladay, Daniel

    2015-09-11

    Presentation. The goal: to produce a robust library capable of computing reasonably accurate opacities inline with the assumption of LTE relaxed (non-LTE). Near term: demonstrate acceleration of non-LTE opacity computation. Far term (if funded): connect to application codes with in-line capability and compute opacities. Study science problems. Use efficient algorithms that expose many levels of parallelism and utilize good memory access patterns for use on advanced architectures. Portability to multiple types of hardware including multicore processors, manycore processors such as KNL, GPUs, etc. Easily coupled to radiation hydrodynamics and thermal radiative transfer codes.

  6. Evaluating parallel relational databases for medical data analysis.

    SciTech Connect (OSTI)

    Rintoul, Mark Daniel; Wilson, Andrew T.

    2012-03-01

    Hospitals have always generated and consumed large amounts of data concerning patients, treatment and outcomes. As computers and networks have permeated the hospital environment it has become feasible to collect and organize all of this data. This raises naturally the question of how to deal with the resulting mountain of information. In this report we detail a proof-of-concept test using two commercially available parallel database systems to analyze a set of real, de-identified medical records. We examine database scalability as data sizes increase as well as responsiveness under load from multiple users.

  7. Performing a global barrier operation in a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-12-09

    Executing computing tasks on a parallel computer that includes compute nodes coupled for data communications, where each compute node executes tasks, with one task on each compute node designated as a master task, including: for each task on each compute node until all master tasks have joined a global barrier: determining whether the task is a master task; if the task is not a master task, joining a single local barrier; if the task is a master task, joining the global barrier and the single local barrier only after all other tasks on the compute node have joined the single local barrier.

  8. Local rollback for fault-tolerance in parallel computing systems

    DOE Patents [OSTI]

    Blumrich, Matthias A. (Yorktown Heights, NY); Chen, Dong (Yorktown Heights, NY); Gara, Alan (Yorktown Heights, NY); Giampapa, Mark E. (Yorktown Heights, NY); Heidelberger, Philip (Yorktown Heights, NY); Ohmacht, Martin (Yorktown Heights, NY); Steinmacher-Burow, Burkhard (Boeblingen, DE); Sugavanam, Krishnan (Yorktown Heights, NY)

    2012-01-24

    A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.

  9. Scripts for Scalable Monitoring of Parallel Filesystem Infrastructure

    Energy Science and Technology Software Center (OSTI)

    2014-02-27

    Scripts for scalable monitoring of parallel filesystem infrastructure provide frameworks for monitoring the health of block storage arrays and large InfiniBand fabrics. The block storage framework uses Python multiprocessing to within scale the number monitored arrays to scale with the number of processors in the system. This enables live monitoring of HPC-scale filesystem with 10-50 storage arrays. For InfiniBand monitoring, there are scripts included that monitor InfiniBand health of each host along with visualization toolsmore » for mapping the topology of complex fabric topologies.« less

  10. Digital intermediate frequency QAM modulator using parallel processing

    DOE Patents [OSTI]

    Pao, Hsueh-Yuan (Livermore, CA); Tran, Binh-Nien (San Ramon, CA)

    2008-05-27

    The digital Intermediate Frequency (IF) modulator applies to various modulation types and offers a simple and low cost method to implement a high-speed digital IF modulator using field programmable gate arrays (FPGAs). The architecture eliminates multipliers and sequential processing by storing the pre-computed modulated cosine and sine carriers in ROM look-up-tables (LUTs). The high-speed input data stream is parallel processed using the corresponding LUTs, which reduces the main processing speed, allowing the use of low cost FPGAs.

  11. Parallel Computation of Persistent Homology using the Blowup Complex

    SciTech Connect (OSTI)

    Lewis, Ryan; Morozov, Dmitriy

    2015-04-27

    We describe a parallel algorithm that computes persistent homology, an algebraic descriptor of a filtered topological space. Our algorithm is distinguished by operating on a spatial decomposition of the domain, as opposed to a decomposition with respect to the filtration. We rely on a classical construction, called the Mayer--Vietoris blowup complex, to glue global topological information about a space from its disjoint subsets. We introduce an efficient algorithm to perform this gluing operation, which may be of independent interest, and describe how to process the domain hierarchically. We report on a set of experiments that help assess the strengths and identify the limitations of our method.

  12. Implementation of a parallel multilevel secure process. Master's thesis

    SciTech Connect (OSTI)

    Pratt, D.R.

    1988-06-01

    This thesis demonstrates an implementation of a parallel multilevel secure process. This is done within the framework of an electronic-mail system. Security is implemented by GEMSOS, the operating system of the Gemini Trusted Computer Base. A brief history of computer secrecy is followed by a discussion of security kernels. Event counts and sequences are used to provide concurrency control and are covered in detail. The specifications for the system are based upon the requirements for a Headquarters of a hypothetical Marine Battalion in garrison.

  13. SERODS optical data storage with parallel signal transfer

    DOE Patents [OSTI]

    Vo-Dinh, Tuan

    2003-09-02

    Surface-enhanced Raman optical data storage (SERODS) systems having increased reading and writing speeds, that is, increased data transfer rates, are disclosed. In the various SERODS read and write systems, the surface-enhanced Raman scattering (SERS) data is written and read using a two-dimensional process called parallel signal transfer (PST). The various embodiments utilize laser light beam excitation of the SERODS medium, optical filtering, beam imaging, and two-dimensional light detection. Two- and three-dimensional SERODS media are utilized. The SERODS write systems employ either a different laser or a different level of laser power.

  14. SERODS optical data storage with parallel signal transfer

    DOE Patents [OSTI]

    Vo-Dinh, Tuan (Knoxville, TN)

    2003-06-24

    Surface-enhanced Raman optical data storage (SERODS) systems having increased reading and writing speeds, that is, increased data transfer rates, are disclosed. In the various SERODS read and write systems, the surface-enhanced Raman scattering (SERS) data is written and read using a two-dimensional process called parallel signal transfer (PST). The various embodiments utilize laser light beam excitation of the SERODS medium, optical filtering, beam imaging, and two-dimensional light detection. Two- and three-dimensional SERODS media are utilized. The SERODS write systems employ either a different laser or a different level of laser power.

  15. Scripts for Scalable Monitoring of Parallel Filesystem Infrastructure

    SciTech Connect (OSTI)

    2014-02-27

    Scripts for scalable monitoring of parallel filesystem infrastructure provide frameworks for monitoring the health of block storage arrays and large InfiniBand fabrics. The block storage framework uses Python multiprocessing to within scale the number monitored arrays to scale with the number of processors in the system. This enables live monitoring of HPC-scale filesystem with 10-50 storage arrays. For InfiniBand monitoring, there are scripts included that monitor InfiniBand health of each host along with visualization tools for mapping the topology of complex fabric topologies.

  16. Effective matter cosmologies of massive gravity I: non-physical fluids

    SciTech Connect (OSTI)

    Y?lmaz, Nejat Tevfik

    2014-08-01

    For the massive gravity, after decoupling from the metric equation we find a broad class of solutions of the Stckelberg sector by solving the background metric in the presence of a diagonal physical metric. We then construct the dynamics of the corresponding FLRW cosmologies which inherit effective matter contribution through the decoupling solution mechanism of the scalar sector.

  17. Deconfinement phase transition in a finite volume in the presence of massive particles

    SciTech Connect (OSTI)

    Ait El Djoudi, A.; Ghenam, L.

    2012-06-27

    We study the QCD deconfinement phase transition from a hadronic gas to a Quark-Gluon Plasma, in the presence of massive particles. Especially, the influence of some parameters as the finite volume, finite mass, flavors number N{sub f} on the transition point and on the order of the transition is investigated.

  18. NIR SPECTROSCOPIC OBSERVATION OF MASSIVE GALAXIES IN THE PROTOCLUSTER AT z = 3.09

    SciTech Connect (OSTI)

    Kubo, Mariko; Yamada, Toru; Ichikawa, Takashi; Kajisawa, Masaru; Matsuda, Yuichi; Tanaka, Ichi

    2015-01-20

    We present the results of near-infrared spectroscopic observations of the K-band-selected candidate galaxies in the protocluster at z = 3.09 in the SSA22 field. We observed 67 candidates with K {sub AB} < 24 and confirmed redshifts of the 39 galaxies at 2.0 < z {sub spec} < 3.4. Of the 67 candidates, 24 are certainly protocluster members with 3.04 ? z {sub spec} ? 3.12, which are massive red galaxies that have been unidentified in previous optical observations of the SSA22 protocluster. Many distant red galaxies (J K {sub AB} > 1.4), hyper extremely red objects (J K {sub AB} > 2.1), Spitzer MIPS 24 ?m sources, active galactic nuclei (AGNs) as well as the counterparts of Ly? blobs and the AzTEC/ASTE 1.1 mm sources in the SSA22 field are also found to be protocluster members. The mass of the SSA22 protocluster is estimated to be ?2-5 10{sup 14} M {sub ?}, and this system is plausibly a progenitor of the most massive clusters of galaxies in the current universe. The reddest (J K {sub AB} ? 2.4) protocluster galaxies are massive galaxies with M {sub star} ? 10{sup 11} M {sub ?} showing quiescent star formation activities and plausibly dominated by old stellar populations. Most of these massive quiescent galaxies host moderately luminous AGNs detected by X-ray. There are no significant differences in the [O III] ?5007/H? emission line ratios and [O III] ?5007 line widths and spatial extents of the protocluster galaxies from those of massive galaxies at z ? 2-3 in the general field.

  19. Scalable Library for the Parallel Solution of Sparse Linear Systems

    Energy Science and Technology Software Center (OSTI)

    1993-07-14

    BlockSolve is a scalable parallel software library for the solution of large sparse, symmetric systems of linear equations. It runs on a variety of parallel architectures and can easily be ported to others. BlockSovle is primarily intended for the solution of sparse linear systems that arise from physical problems having multiple degrees of freedom at each node point. For example, when the finite element method is used to solve practical problems in structural engineering, eachmore » node will typically have anywhere from 3-6 degrees of freedom associated with it. BlockSolve is written to take advantage of problems of this nature; however, it is still reasonably efficient for problems that have only one degree of freedom associated with each node, such as the three-dimensional Poisson problem. It does not require that the matrices have any particular structure other than being sparse and symmetric. BlockSolve is intended to be used within real application codes. It is designed to work best in the context of our experience which indicated that most application codes solve the same linear systems with several different right-hand sides and/or linear systems with the same structure, but different matrix values multiple times.« less

  20. Long-time dynamics through parallel trajectory splicing

    SciTech Connect (OSTI)

    Perez, Danny; Cubuk, Ekin D.; Waterland, Amos; Kaxiras, Efthimios; Voter, Arthur F.

    2015-11-24

    Simulating the atomistic evolution of materials over long time scales is a longstanding challenge, especially for complex systems where the distribution of barrier heights is very heterogeneous. Such systems are difficult to investigate using conventional long-time scale techniques, and the fact that they tend to remain trapped in small regions of configuration space for extended periods of time strongly limits the physical insights gained from short simulations. We introduce a novel simulation technique, Parallel Trajectory Splicing (ParSplice), that aims at addressing this problem through the timewise parallelization of long trajectories. The computational efficiency of ParSplice stems from a speculation strategy whereby predictions of the future evolution of the system are leveraged to increase the amount of work that can be concurrently performed at any one time, hence improving the scalability of the method. ParSplice is also able to accurately account for, and potentially reuse, a substantial fraction of the computational work invested in the simulation. We validate the method on a simple Ag surface system and demonstrate substantial increases in efficiency compared to previous methods. As a result, we then demonstrate the power of ParSplice through the study of topology changes in Ag42Cu13 coreshell nanoparticles.

  1. Energy Proportionality and Performance in Data Parallel Computing Clusters

    SciTech Connect (OSTI)

    Kim, Jinoh; Chou, Jerry; Rotem, Doron

    2011-02-14

    Energy consumption in datacenters has recently become a major concern due to the rising operational costs andscalability issues. Recent solutions to this problem propose the principle of energy proportionality, i.e., the amount of energy consumedby the server nodes must be proportional to the amount of work performed. For data parallelism and fault tolerancepurposes, most common file systems used in MapReduce-type clusters maintain a set of replicas for each data block. A coveringset is a group of nodes that together contain at least one replica of the data blocks needed for performing computing tasks. In thiswork, we develop and analyze algorithms to maintain energy proportionality by discovering a covering set that minimizesenergy consumption while placing the remaining nodes in lowpower standby mode. Our algorithms can also discover coveringsets in heterogeneous computing environments. In order to allow more data parallelism, we generalize our algorithms so that itcan discover k-covering sets, i.e., a set of nodes that contain at least k replicas of the data blocks. Our experimental results showthat we can achieve substantial energy saving without significant performance loss in diverse cluster configurations and workingenvironments.

  2. Long-time dynamics through parallel trajectory splicing

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Perez, Danny; Cubuk, Ekin D.; Waterland, Amos; Kaxiras, Efthimios; Voter, Arthur F.

    2015-11-24

    Simulating the atomistic evolution of materials over long time scales is a longstanding challenge, especially for complex systems where the distribution of barrier heights is very heterogeneous. Such systems are difficult to investigate using conventional long-time scale techniques, and the fact that they tend to remain trapped in small regions of configuration space for extended periods of time strongly limits the physical insights gained from short simulations. We introduce a novel simulation technique, Parallel Trajectory Splicing (ParSplice), that aims at addressing this problem through the timewise parallelization of long trajectories. The computational efficiency of ParSplice stems from a speculation strategymore » whereby predictions of the future evolution of the system are leveraged to increase the amount of work that can be concurrently performed at any one time, hence improving the scalability of the method. ParSplice is also able to accurately account for, and potentially reuse, a substantial fraction of the computational work invested in the simulation. We validate the method on a simple Ag surface system and demonstrate substantial increases in efficiency compared to previous methods. As a result, we then demonstrate the power of ParSplice through the study of topology changes in Ag42Cu13 core–shell nanoparticles.« less

  3. A Programming Model Performance Study Using the NAS Parallel Benchmarks

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; Hargrove, Paul; Jin, Haoqiang; Fuerlinger, Karl; Koniges, Alice; Wright, Nicholas J.

    2010-01-01

    Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less

  4. High-performance parallel interface to synchronous optical network gateway

    DOE Patents [OSTI]

    St. John, W.B.; DuBois, D.H.

    1996-12-03

    Disclosed is a system of sending and receiving gateways interconnects high speed data interfaces, e.g., HIPPI interfaces, through fiber optic links, e.g., a SONET network. An electronic stripe distributor distributes bytes of data from a first interface at the sending gateway onto parallel fiber optics of the fiber optic link to form transmitted data. An electronic stripe collector receives the transmitted data on the parallel fiber optics and reforms the data into a format effective for input to a second interface at the receiving gateway. Preferably, an error correcting syndrome is constructed at the sending gateway and sent with a data frame so that transmission errors can be detected and corrected in a real-time basis. Since the high speed data interface operates faster than any of the fiber optic links the transmission rate must be adapted to match the available number of fiber optic links so the sending and receiving gateways monitor the availability of fiber links and adjust the data throughput accordingly. In another aspect, the receiving gateway must have sufficient available buffer capacity to accept an incoming data frame. A credit-based flow control system provides for continuously updating the sending gateway on the available buffer capacity at the receiving gateway. 7 figs.

  5. Users manual for the Chameleon parallel programming tools

    SciTech Connect (OSTI)

    Gropp, W.; Smith, B.

    1993-06-01

    Message passing is a common method for writing programs for distributed-memory parallel computers. Unfortunately, the lack of a standard for message passing has hampered the construction of portable and efficient parallel programs. In an attempt to remedy this problem, a number of groups have developed their own message-passing systems, each with its own strengths and weaknesses. Chameleon is a second-generation system of this type. Rather than replacing these existing systems, Chameleon is meant to supplement them by providing a uniform way to access many of these systems. Chameleon`s goals are to (a) be very lightweight (low over-head), (b) be highly portable, and (c) help standardize program startup and the use of emerging message-passing operations such as collective operations on subsets of processors. Chameleon also provides a way to port programs written using PICL or Intel NX message passing to other systems, including collections of workstations. Chameleon is tracking the Message-Passing Interface (MPI) draft standard and will provide both an MPI implementation and an MPI transport layer. Chameleon provides support for heterogeneous computing by using p4 and PVM. Chameleon`s support for homogeneous computing includes the portable libraries p4, PICL, and PVM and vendor-specific implementation for Intel NX, IBM EUI (SP-1), and Thinking Machines CMMD (CM-5). Support for Ncube and PVM 3.x is also under development.

  6. Perm Web: remote parallel and distributed volume visualization

    SciTech Connect (OSTI)

    Wittenbrink, C.M.; Kim, K.; Story, J.; Pang, A.; Hollerbach, K.; Max, N.

    1997-01-01

    In this paper we present a system for visualizing volume data from remote supercomputers (PermWeb). We have developed both parallel volume rendering algorithms, and the World Wide Web software for accessing the data at the remote sites. The implementation uses Hypertext Markup Language (HTML), Java, and Common Gateway Interface (CGI) scripts to connect World Wide Web (WWW) servers/clients to our volume renderers. The front ends are interactive Java classes for specification of view, shading, and classification inputs. We present performance results, and implementation details for connections to our computing resources at the University of California Santa Cruz including a MasPar MP-2, SGI Reality Engine-RE2, and SGI Challenge machines. We apply the system to the task of visualizing trabecular bone from finite element simulations. Fast volume rendering on remote compute servers through a web interface allows us to increase the accessibility of the results to more users. User interface issues, overviews of parallel algorithm developments, and overall system interfaces and protocols are presented. Access is available through Uniform Resource Locator (URL) http://www.cse.ucsc.edu/research/slvg/. 26 refs., 7 figs.

  7. High-performance parallel interface to synchronous optical network gateway

    DOE Patents [OSTI]

    St. John, Wallace B. (Los Alamos, NM); DuBois, David H. (Los Alamos, NM)

    1996-01-01

    A system of sending and receiving gateways interconnects high speed data interfaces, e.g., HIPPI interfaces, through fiber optic links, e.g., a SONET network. An electronic stripe distributor distributes bytes of data from a first interface at the sending gateway onto parallel fiber optics of the fiber optic link to form transmitted data. An electronic stripe collector receives the transmitted data on the parallel fiber optics and reforms the data into a format effective for input to a second interface at the receiving gateway. Preferably, an error correcting syndrome is constructed at the sending gateway and sent with a data frame so that transmission errors can be detected and corrected in a real-time basis. Since the high speed data interface operates faster than any of the fiber optic links the transmission rate must be adapted to match the available number of fiber optic links so the sending and receiving gateways monitor the availability of fiber links and adjust the data throughput accordingly. In another aspect, the receiving gateway must have sufficient available buffer capacity to accept an incoming data frame. A credit-based flow control system provides for continuously updating the sending gateway on the available buffer capacity at the receiving gateway.

  8. Aggregating job exit statuses of a plurality of compute nodes executing a parallel application

    DOE Patents [OSTI]

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Mundy, Michael B.

    2015-07-21

    Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.

  9. Cooperative storage of shared files in a parallel computing system with dynamic block size

    DOE Patents [OSTI]

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2015-11-10

    Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).

  10. The parallel I/O architecture of the High Performance Storage System (HPSS)

    SciTech Connect (OSTI)

    Watson, R.W.; Coyne, R.A.

    1995-02-01

    Rapid improvements in computational science, processing capability, main memory sizes, data collection devices, multimedia capabilities and integration of enterprise data are producing very large datasets (10s-100s of gigabytes to terabytes). This rapid growth of data has resulted in a serious imbalance in I/O and storage system performance and functionality. One promising approach to restoring balanced I/O and storage system performance is use of parallel data transfer techniques for client access to storage, device-to-device transfers, and remote file transfers. This paper describes the parallel I/O architecture and mechanisms, Parallel Transport Protocol, parallel FIP, and parallel client Application Programming Interface (API) used by the High Performance Storage System (HPSS). Parallel storage integration issues with a local parallel file system are also discussed.

  11. Design of dynamic load-balancing tools for parallel applications

    SciTech Connect (OSTI)

    Devine, K.D.; Hendrickson, B.A.; Boman, E.G.; St. John, M.; Vaughan, C.T.

    2000-01-03

    The design of general-purpose dynamic load-balancing tools for parallel applications is more challenging than the design of static partitioning tools. Both algorithmic and software engineering issues arise. The authors have addressed many of these issues in the design of the Zoltan dynamic load-balancing library. Zoltan has an object-oriented interface that makes it easy to use and provides separation between the application and the load-balancing algorithms. It contains a suite of dynamic load-balancing algorithms, including both geometric and graph-based algorithms. Its design makes it valuable both as a partitioning tool for a variety of applications and as a research test-bed for new algorithmic development. In this paper, the authors describe Zoltan's design and demonstrate its use in an unstructured-mesh finite element application.

  12. Runtime optimization of an application executing on a parallel computer

    DOE Patents [OSTI]

    Faraj, Daniel A.; Smith, Brian E.

    2013-01-29

    Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.

  13. Microchannel cross load array with dense parallel input

    DOE Patents [OSTI]

    Swierkowski, Stefan P.

    2004-04-06

    An architecture or layout for microchannel arrays using T or Cross (+) loading for electrophoresis or other injection and separation chemistry that are performed in microfluidic configurations. This architecture enables a very dense layout of arrays of functionally identical shaped channels and it also solves the problem of simultaneously enabling efficient parallel shapes and biasing of the input wells, waste wells, and bias wells at the input end of the separation columns. One T load architecture uses circular holes with common rows, but not columns, which allows the flow paths for each channel to be identical in shape, using multiple mirror image pieces. Another T load architecture enables the access hole array to be formed on a biaxial, collinear grid suitable for EDM micromachining (square holes), with common rows and columns.

  14. Tracking moving radar targets with parallel, velocity-tuned filters

    DOE Patents [OSTI]

    Bickel, Douglas L.; Harmony, David W.; Bielek, Timothy P.; Hollowell, Jeff A.; Murray, Margaret S.; Martinez, Ana

    2013-04-30

    Radar data associated with radar illumination of a movable target is processed to monitor motion of the target. A plurality of filter operations are performed in parallel on the radar data so that each filter operation produces target image information. The filter operations are defined to have respectively corresponding velocity ranges that differ from one another. The target image information produced by one of the filter operations represents the target more accurately than the target image information produced by the remainder of the filter operations when a current velocity of the target is within the velocity range associated with the one filter operation. In response to the current velocity of the target being within the velocity range associated with the one filter operation, motion of the target is tracked based on the target image information produced by the one filter operation.

  15. Runtime optimization of an application executing on a parallel computer

    DOE Patents [OSTI]

    Faraj, Daniel A; Smith, Brian E

    2014-11-18

    Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.

  16. Determining collective barrier operation skew in a parallel computer

    DOE Patents [OSTI]

    Faraj, Daniel A.

    2015-11-24

    Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by: identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.

  17. Runtime optimization of an application executing on a parallel computer

    DOE Patents [OSTI]

    Faraj, Daniel A; Smith, Brian E

    2014-11-25

    Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.

  18. Synchronizing compute node time bases in a parallel computer

    DOE Patents [OSTI]

    Chen, Dong; Faraj, Daniel A; Gooding, Thomas M; Heidelberger, Philip

    2014-12-30

    Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.

  19. Synchronizing compute node time bases in a parallel computer

    DOE Patents [OSTI]

    Chen, Dong; Faraj, Daniel A; Gooding, Thomas M; Heidelberger, Philip

    2015-01-27

    Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.

  20. Optimized collectives using a DMA on a parallel computer

    DOE Patents [OSTI]

    Chen, Dong; Gabor, Dozsa; Giampapa, Mark E.; Heidelberger; Phillip

    2011-02-08

    Optimizing collective operations using direct memory access controller on a parallel computer, in one aspect, may comprise establishing a byte counter associated with a direct memory access controller for each submessage in a message. The byte counter includes at least a base address of memory and a byte count associated with a submessage. A byte counter associated with a submessage is monitored to determine whether at least a block of data of the submessage has been received. The block of data has a predetermined size, for example, a number of bytes. The block is processed when the block has been fully received, for example, when the byte count indicates all bytes of the block have been received. The monitoring and processing may continue for all blocks in all submessages in the message.

  1. Stochastic PArallel Rarefied-gas Time-accurate Analyzer

    Energy Science and Technology Software Center (OSTI)

    2014-01-24

    The SPARTA package is software for simulating low-density fluids via the Direct Simulation Monte Carlo (DSMC) method, which is a particle-based method for tracking particle trajectories and collisions as a model of a multi-species gas. The main component of SPARTA is a simulation code which allows the user to specify a simulation domain, populate it with particles, embed triangulated surfaces as boundary conditions for the flow, overlay a grid for finding pairs of collision partners,more » and evolve the system in time via explicit timestepping. The package also includes various pre- and post-processing tools, useful for setting up simulations and analyzing the results. The simulation code runs either in serial on a single processor or desktop machine, or can be run in parallel using the MPI message-passing library, to enable faster performance on large problems.« less

  2. DMA shared byte counters in a parallel computer

    DOE Patents [OSTI]

    Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos

    2010-04-06

    A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.

  3. Development of parallel DEM for the open source code MFIX

    SciTech Connect (OSTI)

    Gopalakrishnan, Pradeep; Tafti, Danesh

    2013-02-01

    The paper presents the development of a parallel Discrete Element Method (DEM) solver for the open source code, Multiphase Flow with Interphase eXchange (MFIX) based on the domain decomposition method. The performance of the code was evaluated by simulating a bubbling fluidized bed with 2.5 million particles. The DEM solver shows strong scalability up to 256 processors with an efficiency of 81%. Further, to analyze weak scaling, the static height of the fluidized bed was increased to hold 5 and 10 million particles. The results show that global communication cost increases with problem size while the computational cost remains constant. Further, the effects of static bed height on the bubble hydrodynamics and mixing characteristics are analyzed.

  4. Determining collective barrier operation skew in a parallel computer

    DOE Patents [OSTI]

    Faraj, Daniel A.

    2015-12-24

    Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by: identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.

  5. Parallel processor-based raster graphics system architecture

    DOE Patents [OSTI]

    Littlefield, Richard J. (Seattle, WA)

    1990-01-01

    An apparatus for generating raster graphics images from the graphics command stream includes a plurality of graphics processors connected in parallel, each adapted to receive any part of the graphics command stream for processing the command stream part into pixel data. The apparatus also includes a frame buffer for mapping the pixel data to pixel locations and an interconnection network for interconnecting the graphics processors to the frame buffer. Through the interconnection network, each graphics processor may access any part of the frame buffer concurrently with another graphics processor accessing any other part of the frame buffer. The plurality of graphics processors can thereby transmit concurrently pixel data to pixel locations in the frame buffer.

  6. Final Report: Super Instruction Architecture for Scalable Parallel Computations

    SciTech Connect (OSTI)

    Sanders, Beverly Ann; Bartlett, Rodney; Deumens, Erik

    2013-12-23

    The most advanced methods for reliable and accurate computation of the electronic structure of molecular and nano systems are the coupled-cluster techniques. These high-accuracy methods help us to understand, for example, how biological enzymes operate and contribute to the design of new organic explosives. The ACES III software provides a modern, high-performance implementation of these methods optimized for high performance parallel computer systems, ranging from small clusters typical in individual research groups, through larger clusters available in campus and regional computer centers, all the way to high-end petascale systems at national labs, including exploiting GPUs if available. This project enhanced the ACESIII software package and used it to study interesting scientific problems.

  7. Nonlocal microscopic theory of quantum friction between parallel metallic slabs

    SciTech Connect (OSTI)

    Despoja, Vito

    2011-05-15

    We present a new derivation of the friction force between two metallic slabs moving with constant relative parallel velocity, based on T=0 quantum-field theory formalism. By including a fully nonlocal description of dynamically screened electron fluctuations in the slab, and avoiding the usual matching-condition procedure, we generalize previous expressions for the friction force, to which our results reduce in the local limit. Analyzing the friction force calculated in the two local models and in the nonlocal theory, we show that for physically relevant velocities local theories using the plasmon and Drude models of dielectric response are inappropriate to describe friction, which is due to excitation of low-energy electron-hole pairs, which are properly included in nonlocal theory. We also show that inclusion of dissipation in the nonlocal electronic response has negligible influence on friction.

  8. Parallel Access of Out-Of-Core Dense Extendible Arrays

    SciTech Connect (OSTI)

    Otoo, Ekow J; Rotem, Doron

    2007-07-26

    Datasets used in scientific and engineering applications are often modeled as dense multi-dimensional arrays. For very large datasets, the corresponding array models are typically stored out-of-core as array files. The array elements are mapped onto linear consecutive locations that correspond to the linear ordering of the multi-dimensional indices. Two conventional mappings used are the row-major order and the column-major order of multi-dimensional arrays. Such conventional mappings of dense array files highly limit the performance of applications and the extendibility of the dataset. Firstly, an array file that is organized in say row-major order causes applications that subsequently access the data in column-major order, to have abysmal performance. Secondly, any subsequent expansion of the array file is limited to only one dimension. Expansions of such out-of-core conventional arrays along arbitrary dimensions, require storage reorganization that can be very expensive. Wepresent a solution for storing out-of-core dense extendible arrays that resolve the two limitations. The method uses a mapping function F*(), together with information maintained in axial vectors, to compute the linear address of an extendible array element when passed its k-dimensional index. We also give the inverse function, F-1*() for deriving the k-dimensional index when given the linear address. We show how the mapping function, in combination with MPI-IO and a parallel file system, allows for the growth of the extendible array without reorganization and no significant performance degradation of applications accessing elements in any desired order. We give methods for reading and writing sub-arrays into and out of parallel applications that run on a cluster of workstations. The axial-vectors are replicated and maintained in each node that accesses sub-array elements.

  9. Topologically Massive Yang-Mills field on the Null-Plane: A Hamilton-Jacobi approach

    SciTech Connect (OSTI)

    Bertin, M. C.; Pimentel, B. M.; Valcarcel, C. E.; Zambrano, G. E. R.

    2010-11-12

    Non-abelian gauge theories are super-renormalizable in 2+1 dimensions and suffer from infrared divergences. These divergences can be avoided by adding a Chern-Simons term, i.e., building a Topologically Massive Theory. In this sense, we are interested in the study of the Topologically Massive Yang-Mills theory on the Null-Plane. Since this is a gauge theory, we need to analyze its constraint structure which is done with the Hamilton-Jacobi formalism. We are able to find the complete set of Hamiltonian densities, and build the Generalized Brackets of the theory. With the GB we obtain a set of involutive Hamiltonian densities, generators of the evolution of the system.

  10. The coupling to matter in massive, bi- and multi-gravity

    SciTech Connect (OSTI)

    Noller, Johannes; Melville, Scott E-mail: scott.melville@queens.ox.ac.uk

    2015-01-01

    In this paper we construct a family of ways in which matter can couple to one or more 'metrics'/spin-2 fields in the vielbein formulation. We do so subject to requiring the weak equivalence principle and the absence of ghosts from pure spin-2 interactions generated by the matter action. Results are presented for Massive, Bi- and Multi-Gravity theories and we give explicit expressions for the effective matter metric in all of these cases.

  11. Blast furnace injection of massive quantities of coal with enriched air or pure oxygen

    SciTech Connect (OSTI)

    Ponghis, N.; Dufresne, P.; Vidal, R.; Poos, A. )

    1993-01-01

    An extensive study of the phenomena associated with the blast furnace injection of massive quantities of coal is described. Trials with conventional lances or oxy-coal injectors and hot blast at different oxygen contents - up to 40% - or with cold pure oxygen were realized at coal to oxygen ratios corresponding to a range of 150 to 440 kg. Pilot scale rigs, empty or filled with coke, as well as industrial blast furnaces were utilized.

  12. Topologically massive Yang-Mills: A Hamilton-Jacobi constraint analysis

    SciTech Connect (OSTI)

    Bertin, M. C.; Pimentel, B. M.; Valcrcel, C. E.; Zambrano, G. E. R.

    2014-04-15

    We analyse the constraint structure of the topologically massive Yang-Mills theory in instant-form and null-plane dynamics via the Hamilton-Jacobi formalism. The complete set of hamiltonians that generates the dynamics of the system is obtained from the Frobenius integrability conditions, as well as its characteristic equations. As generators of canonical transformations, the hamiltonians are naturally linked to the generator of Lagrangian gauge transformations.

  13. The fate of high redshift massive compact galaxies in dense environments

    SciTech Connect (OSTI)

    Kaufmann, Tobias; Mayer, Lucio; Carollo, Marcella; Feldmann, Robert; /Fermilab /Chicago U., KICP

    2012-01-01

    Massive compact galaxies seem to be more common at high redshift than in the local universe, especially in denser environments. To investigate the fate of such massive galaxies identified at z {approx} 2 we analyse the evolution of their properties in three cosmological hydrodynamical simulations that form virialized galaxy groups of mass {approx} 10{sup 13} M{sub {circle_dot}} hosting a central massive elliptical/S0 galaxy by redshift zero. We find that at redshift {approx} 2 the population of galaxies with M{sub *} > 2 x 10{sup 10} M{sub {circle_dot}} is diverse in terms of mass, velocity dispersion, star formation and effective radius, containing both very compact and relatively extended objects. In each simulation all the compact satellite galaxies have merged into the central galaxy by redshift 0 (with the exception of one simulation where one of such satellite galaxy survives). Satellites of similar mass at z = 0 are all less compact than their high redshift counterparts. They form later than the galaxies in the z = 2 sample and enter the group potential at z < 1, when dynamical friction times are longer than the Hubble time. Also, by z = 0 the central galaxies have increased substantially their characteristic radius via a combination of in situ star formation and mergers. Hence in a group environment descendants of compact galaxies either evolve towards larger sizes or they disappear before the present time as a result of the environment in which they evolve. Since the group-sized halos that we consider are representative of dense environments in the {Lambda}CDM cosmology, we conclude that the majority of high redshift compact massive galaxies do not survive until today as a result of the environment.

  14. The Princeton Tritium Observatory for Light, Early Universe, Massive Neutrino Yield (PTOLEMY) Prototype

    Office of Environmental Management (EM)

    Princeton Tritium Observatory for Light, Early Universe, Massive Neutrino Yield (PTOLEMY) Tritium Focus Group Meeting Sept 24, 2014 C.A. Gentile and P.G. Efthimion on behalf of the PTOLEMY team Motivation * Big bang relic neutrinos are predicted to be amongst the oldest and smallest particles in the universe. Information on their mass and density would significantly enhance our understanding of elementary particles, the ways in which mass is distributed, and the formation of the universe. *

  15. Preliminary Failure Modes and Effects Analysis of the US Massive Gas Injection Disruption Mitigation System Design

    SciTech Connect (OSTI)

    Lee C. Cadwallader

    2013-10-01

    This report presents the results of a preliminary failure modes and effects analysis (FMEA) of a candidate design for the ITER Disruption Mitigation System. This candidate is the Massive Gas Injection System that provides machine protection in a plasma disruption event. The FMEA was quantified with generic component failure rate data as well as some data calculated from operating facilities, and the failure events were ranked for their criticality to system operation.

  16. "On the Formation of Massive Galaxies" | Princeton Plasma Physics Lab

    Broader source: All U.S. Department of Energy (DOE) Office Webpages (Extended Search)

    December 19, 2012, 4:15pm Colloquia MBG Auditorium "On the Formation of Massive Galaxies" Professor Jeremiah Ostriker Princeton University Presentation: File WC19DEC2-12_JOstriker.pptx Looking backwards, using fossil evidence from nearby galaxies provides a plausible picture of how galaxies have formed over cosmic time. Also, going forwards, the present quite definite cosmological model, shows how perturbations grew from low amplitude fluctuations via standard physical processes to the

  17. Spherically symmetric analysis on open FLRW solution in non-linear massive gravity

    SciTech Connect (OSTI)

    Chiang, Chien-I; Izumi, Keisuke; Chen, Pisin E-mail: izumi@phys.ntu.edu.tw

    2012-12-01

    We study non-linear massive gravity in the spherically symmetric context. Our main motivation is to investigate the effect of helicity-0 mode which remains elusive after analysis of cosmological perturbation around an open Friedmann-Lemaitre-Robertson-Walker (FLRW) universe. The non-linear form of the effective energy-momentum tensor stemming from the mass term is derived for the spherically symmetric case. Only in the special case where the area of the two sphere is not deviated away from the FLRW universe, the effective energy momentum tensor becomes completely the same as that of cosmological constant. This opens a window for discriminating the non-linear massive gravity from general relativity (GR). Indeed, by further solving these spherically symmetric gravitational equations of motion in vacuum to the linear order, we obtain a solution which has an arbitrary time-dependent parameter. In GR, this parameter is a constant and corresponds to the mass of a star. Our result means that Birkhoff's theorem no longer holds in the non-linear massive gravity and suggests that energy can probably be emitted superluminously (with infinite speed) on the self-accelerating background by the helicity-0 mode, which could be a potential plague of this theory.

  18. Massive gravity on de Sitter and unique candidate for partially massless gravity

    SciTech Connect (OSTI)

    Rham, Claudia de; Renaux-Petel, Sbastien E-mail: srenaux@lpthe.jussieu.fr

    2013-01-01

    We derive the decoupling limit of Massive Gravity on de Sitter in an arbitrary number of space-time dimensions d. By embedding d-dimensional de Sitter into d+1-dimensional Minkowski, we extract the physical helicity-1 and helicity-0 polarizations of the graviton. The resulting decoupling theory is similar to that obtained around Minkowski. We take great care at exploring the partially massless limit and define the unique fully non-linear candidate theory that is free of the helicity-0 mode in the decoupling limit, and which therefore propagates only four degrees of freedom in four dimensions. In the latter situation, we show that a new Vainshtein mechanism is at work in the limit m{sup 2} ? 2H{sup 2} which decouples the helicity-0 mode when the parameters are different from that of partially massless gravity. As a result, there is no discontinuity between massive gravity and its partially massless limit, just in the same way as there is no discontinuity in the massless limit of massive gravity. The usual bounds on the graviton mass could therefore equivalently well be interpreted as bounds on m{sup 2}?2H{sup 2}. When dealing with the exact partially massless parameters, on the other hand, the symmetry at m{sup 2} = 2H{sup 2} imposes a specific constraint on matter. As a result the helicity-0 mode decouples without even the need of any Vainshtein mechanism.

  19. Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale

    SciTech Connect (OSTI)

    Daily, Jeffrey A.

    2015-04-21

    The field of bioinformatics and computational biology is currently experiencing a data revolution. The exciting prospect of making fundamental biological discoveries is fueling the rapid development and deployment of numerous cost-effective, high-throughput next-generation sequencing technologies. The result is that the DNA and protein sequence repositories are being bombarded with new sequence information. Databases are continuing to report a Moore’s law-like growth trajectory in their database sizes, roughly doubling every 18 months. In what seems to be a paradigm-shift, individual projects are now capable of generating billions of raw sequence data that need to be analyzed in the presence of already annotated sequence information. While it is clear that data-driven methods, such as sequencing homology detection, are becoming the mainstay in the field of computational life sciences, the algorithmic advancements essential for implementing complex data analytics at scale have mostly lagged behind. Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or “homologous”) on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. In this dissertation, we present the design and evaluation of a parallel implementation for conducting optimal homology detection on distributed memory supercomputers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for a collection of 2.56M sequences show parallel efficiencies of ~75-100% on up to 8K cores, representing a time-to-solution of 33 seconds. We extend this work with a detailed analysis of single-node sequence alignment performance using the latest CPU vector instruction set extensions. Preliminary results reveal that current sequence alignment algorithms are unable to fully utilize widening vector registers.

  20. Parallel In Situ Indexing for Data-intensive Computing

    SciTech Connect (OSTI)

    Kim, Jinoh; Abbasi, Hasan; Chacon, Luis; Docan, Ciprian; Klasky, Scott; Liu, Qing; Podhorszki, Norbert; Shoshani, Arie; Wu, Kesheng

    2011-09-09

    As computing power increases exponentially, vast amount of data is created by many scientific re- search activities. However, the bandwidth for storing the data to disks and reading the data from disks has been improving at a much slower pace. These two trends produce an ever-widening data access gap. Our work brings together two distinct technologies to address this data access issue: indexing and in situ processing. From decades of database research literature, we know that indexing is an effective way to address the data access issue, particularly for accessing relatively small fraction of data records. As data sets increase in sizes, more and more analysts need to use selective data access, which makes indexing an even more important for improving data access. The challenge is that most implementations of in- dexing technology are embedded in large database management systems (DBMS), but most scientific datasets are not managed by any DBMS. In this work, we choose to include indexes with the scientific data instead of requiring the data to be loaded into a DBMS. We use compressed bitmap indexes from the FastBit software which are known to be highly effective for query-intensive workloads common to scientific data analysis. To use the indexes, we need to build them first. The index building procedure needs to access the whole data set and may also require a significant amount of compute time. In this work, we adapt the in situ processing technology to generate the indexes, thus removing the need of read- ing data from disks and to build indexes in parallel. The in situ data processing system used is ADIOS, a middleware for high-performance I/O. Our experimental results show that the indexes can improve the data access time up to 200 times depending on the fraction of data selected, and using in situ data processing system can effectively reduce the time needed to create the indexes, up to 10 times with our in situ technique when using identical parallel settings.

  1. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOE Patents [OSTI]

    Crosetto, D.B.

    1996-12-31

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.

  2. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOE Patents [OSTI]

    Crosetto, Dario B. (DeSoto, TX)

    1996-01-01

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.

  3. NETZ-a compact high speed parallel microprogrammed machine for signal processing

    SciTech Connect (OSTI)

    Dinur, J.; Lahat, M.

    1984-01-01

    A very fast processor, called NETZ, of unconventional architecture, was developed for real-time execution of highly complex computational algorithm. The unconventional architecture design includes advanced techniques such as the incorporation of two processors working in parallel, parallel processing and pipelining including a high-speed hardware multiplier, the use of a special loop counter, and the use of a variable-length computation cycle. A horizontal microprogrammed control unit allows fast parallel execution. 7 references.

  4. Multi-petascale highly efficient parallel supercomputer (Patent) | SciTech

    Office of Scientific and Technical Information (OSTI)

    Connect Multi-petascale highly efficient parallel supercomputer Citation Details In-Document Search Title: Multi-petascale highly efficient parallel supercomputer A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many

  5. Parallel phase-sensitive three-dimensional imaging camera

    DOE Patents [OSTI]

    Smithpeter, Colin L. (Albuquerque, NM); Hoover, Eddie R. (Sandia Park, NM); Pain, Bedabrata (Los Angeles, CA); Hancock, Bruce R. (Altadena, CA); Nellums, Robert O. (Albuquerque, NM)

    2007-09-25

    An apparatus is disclosed for generating a three-dimensional (3-D) image of a scene illuminated by a pulsed light source (e.g. a laser or light-emitting diode). The apparatus, referred to as a phase-sensitive 3-D imaging camera utilizes a two-dimensional (2-D) array of photodetectors to receive light that is reflected or scattered from the scene and processes an electrical output signal from each photodetector in the 2-D array in parallel using multiple modulators, each having inputs of the photodetector output signal and a reference signal, with the reference signal provided to each modulator having a different phase delay. The output from each modulator is provided to a computational unit which can be used to generate intensity and range information for use in generating a 3-D image of the scene. The 3-D camera is capable of generating a 3-D image using a single pulse of light, or alternately can be used to generate subsequent 3-D images with each additional pulse of light.

  6. Parallel Environment for the Creation of Stochastics 1.0

    Energy Science and Technology Software Center (OSTI)

    2011-01-06

    PECOS is a computational library for creating and manipulating realizations of stochastic quantities, including scalar uncertain variables, random fields, and stochastic processes. It offers a unified interface to univariate and multivariate polynomial approximations using either orthogonal or interpolation polynomials; numerical integration drivers for Latin hypercube sampling, quadrature, cubature, and sparse grids; and fast Fourier transforms using third party libraries. The PECOS core also offers statistical utilities and transformations between various representations of stochastic uncertainty. PECOSmore » provides a C++ API through which users can generate and transform realizations of stochastic quantities. It is currently used by Sandia’s DAKOTA, Stokhos, and Encore software packages for uncertainty quantification and verification. PECOS generates random sample sets and multi-dimensional integration grids, typically used in forward propagation of scalar uncertainty in computational models (uncertainty quantification (UQ)). PECOS also generates samples of random fields (RFs) and stochastic processes (SPs) from a set of user-defined power spectral densities (PSDs). The RF/SP may be either Gaussian or non-Gaussian and either stationary or nonstationary, and the resulting sample is intended for run-time query by parallel finite element simulation codes. Finally, PECOS supports nonlinear transformations of random variables via the Nataf transformation and extensions.« less

  7. Parallel performance optimizations on unstructured mesh-based simulations

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less

  8. Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

    SciTech Connect (OSTI)

    Beckingsale, D. A.; Gaudin, W. P.; Hornung, R. D.; Gunney, B. T.; Gamblin, T.; Herdman, J. A.; Jarvis, S. A.

    2014-11-17

    Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratorys Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87 faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.

  9. Rivulet Flow In Vertical Parallel-Wall Channel

    SciTech Connect (OSTI)

    D. M. McEligot; G. E. Mc Creery; P. Meakin

    2006-04-01

    In comparison with studies of rivulet flow over external surfaces, rivulet flow confined by two surfaces has received almost no attention. Fully-developed rivulet flow in vertical parallel-wall channels was characterized, both experimentally and analytically for flows intermediate between a lower flow limit of drop flow and an upper limit where the rivulets meander. Although this regime is the most simple rivulet flow regime, it does not appear to have been previously investigated in detail. Experiments were performed that measured rivulet widths for aperture spacing ranging from 0.152 mm to 0.914 mm. The results were compared with a simple steadystate analytical model for laminar flow. The model divides the rivulet cross-section into an inner region, which is dominated by viscous and gravitational forces and where essentially all flow is assumed to occur, and an outer region, dominated by capillary forces, where the geometry is determined by the contact angle between the fluid and the wall. Calculations using the model provided excellent agreement with data for inner rivulet widths and good agreement with measurements of outer rivulet widths.

  10. Executing a gather operation on a parallel computer

    DOE Patents [OSTI]

    Archer, Charles J. (Rochester, MN); Ratterman, Joseph D. (Rochester, MN)

    2012-03-20

    Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node. Embodiments also include repeatedly for each position in the result buffer: determining, by each compute node of an operational group, whether the current position in the result buffer corresponds with the rank of the compute node, if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data, if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, and storing, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network.

  11. Xyce Parallel Electronic Simulator Reference Guide Version 6.4

    SciTech Connect (OSTI)

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2015-12-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)

  12. Search for Charged Massive Long-Lived Particles Using the D0 Detector

    SciTech Connect (OSTI)

    Xie, Yunhe; /Brown U.

    2009-05-01

    A search for charged massive stable particles has been performed with the D0 detector using 1.1 fb{sup -1} of data. The speed of the particle has been calculated based on the time-of-flight and position information in the muon system. The present research is limited to direct pair-production of the charged massive long-lived particles. We do not consider CMSPs that result from the cascade decays of heavier particles. In this analysis, the exact values of the model parameters of the entire supersymmetric particle mass spectrum, relevant for cascade decays, are not important. We found no evidence of the signal. 95% CL cross-section upper limits have been set on the pair-productions of the stable scaler tau lepton, the gaugino-like charginos, and the higgsino-like charginos. The upper cross section limits vary from 0.31 pb to 0.04 pb, for stau masses in the range between 60 GeV and 300 GeV. We use the nominal value of the theoretical cross section to set limits on the mass of the pair produced charginos. We exclude the pair-produced stable gaugino-like charginos with mass below 206 GeV, and higgsino-like charginos below 171 GeV, respectively. Although the present sensitivity is insufficient to test the model of the pair produced stable staus, we do set cross section limits which can be applied to the pair production of any charged massive stable particle candidates with similar kinematics. These are the most restrictive limits to the present on the cross sections for CMSPs and the first published from the Tevatron Collider Run II. The manuscript has been published by Physical Review Letters in April 2009 and is available at arXiv as.

  13. Gravitational waves and stalled satellites from massive galaxy mergers at z ? 1

    SciTech Connect (OSTI)

    McWilliams, Sean T.; Pretorius, Frans; Ostriker, Jeremiah P.

    2014-07-10

    We present a model for merger-driven evolution of the mass function for massive galaxies and their central supermassive black holes at late times. We discuss the current observational evidence in favor of merger-driven massive galaxy evolution during this epoch, and demonstrate that the observed evolution of the mass function can be reproduced by evolving an initial mass function under the assumption of negligible star formation. We calculate the stochastic gravitational wave signal from the resulting black hole binary mergers in the low redshift universe (z ? 1) implied by this model, and find that this population has a signal-to-noise ratio 2 to 5 larger than previous estimates for pulsar timing arrays, with a (2?, 3?) lower limit within this model of h{sub c}(f = 1 yr{sup 1}) = (1.1 10{sup 15}, 6.8 10{sup 16}). The strength of this signal is sufficient to make it detectable with high probability under conservative assumptions within the next several years. A principle reason that this result is larger than previous estimates is our use of a recent recalibration of the black hole-stellar mass correlation for the brightest cluster galaxies, which increases our estimate by a factor of ?2 relative to past results. For cases where a galaxy merger fails to lead to a black hole merger, we estimate the probability for a given number of satellite black holes to remain within a massive host galaxy, and interpret the result in light of ULX observations. We find that in rare cases, wandering supermassive black holes may be bright enough to appear as ULXs.

  14. Parallel performance of a preconditioned CG solver for unstructured finite element applications

    SciTech Connect (OSTI)

    Shadid, J.N.; Hutchinson, S.A.; Moffat, H.K.

    1994-06-01

    A parallel unstructured finite element (FE) implementation designed for message passing machines is described. This implementation employs automated problem partitioning algorithms for load balancing unstructured grids, a distributed sparse matrix representation of the global finite element equations and a parallel conjugate gradient (CG) solver. In this paper a number of issues related to the efficient implementation of parallel unstructured mesh applications are presented. These include the differences between structured and unstructured mesh parallel applications, major communication kernels for unstructured CG solvers, automatic mesh partitioning algorithms, and the influence of mesh. partitioning metrics on parallel performance. Initial results are presented for example finite element (FE) heat transfer analysis applications on a 1024 processor nCUBE 2 hypercube. Results indicate over 95% scaled efficiencies are obtained for some large problems despite the required unstructured data communication.

  15. Parallel performance of a preconditioned CG solver for unstructured finite element applications

    SciTech Connect (OSTI)

    Shadid, J.N.; Hutchinson, S.A.; Moffat, H.K.

    1994-12-31

    A parallel unstructured finite element (FE) implementation designed for message passing MIMD machines is described. This implementation employs automated problem partitioning algorithms for load balancing unstructured grids, a distributed sparse matrix representation of the global finite element equations and a parallel conjugate gradient (CG) solver. In this paper a number of issues related to the efficient implementation of parallel unstructured mesh applications are presented. These include the differences between structured and unstructured mesh parallel applications, major communication kernels for unstructured CG solvers, automatic mesh partitioning algorithms, and the influence of mesh partitioning metrics on parallel performance. Initial results are presented for example finite element (FE) heat transfer analysis applications on a 1024 processor nCUBE 2 hypercube. Results indicate over 95% scaled efficiencies are obtained for some large problems despite the required unstructured data communication.

  16. Streamline Integration using MPI-Hybrid Parallelism on a Large Multi-Core Architecture

    SciTech Connect (OSTI)

    Camp, David; Garth, Christoph; Childs, Hank; Pugmire, Dave; Joy, Kenneth I.

    2010-11-01

    Streamline computation in a very large vector field data set represents a significant challenge due to the non-local and datadependentnature of streamline integration. In this paper, we conduct a study of the performance characteristics of hybrid parallel programmingand execution as applied to streamline integration on a large, multicore platform. With multi-core processors now prevalent in clustersand supercomputers, there is a need to understand the impact of these hybrid systems in order to make the best implementation choice.We use two MPI-based distribution approaches based on established parallelization paradigms, parallelize-over-seeds and parallelize-overblocks,and present a novel MPI-hybrid algorithm for each approach to compute streamlines. Our findings indicate that the work sharing betweencores in the proposed MPI-hybrid parallel implementation results in much improved performance and consumes less communication andI/O bandwidth than a traditional, non-hybrid distributed implementation.

  17. Parallel Computation of the Topology of Level Sets

    SciTech Connect (OSTI)

    Pascucci, V; Cole-McLaughlin, K

    2004-12-16

    This paper introduces two efficient algorithms that compute the Contour Tree of a 3D scalar field F and its augmented version with the Betti numbers of each isosurface. The Contour Tree is a fundamental data structure in scientific visualization that is used to preprocess the domain mesh to allow optimal computation of isosurfaces with minimal overhead storage. The Contour Tree can also be used to build user interfaces reporting the complete topological characterization of a scalar field, as shown in Figure 1. Data exploration time is reduced since the user understands the evolution of level set components with changing isovalue. The Augmented Contour Tree provides even more accurate information segmenting the range space of the scalar field in portion of invariant topology. The exploration time for a single isosurface is also improved since its genus is known in advance. Our first new algorithm augments any given Contour Tree with the Betti numbers of all possible corresponding isocontours in linear time with the size of the tree. Moreover we show how to extend the scheme introduced in [3] with the Betti number computation without increasing its complexity. Thus, we improve on the time complexity from our previous approach [10] from O(m log m) to O(n log n + m), where m is the number of cells and n is the number of vertices in the domain of F. Our second contribution is a new divide-and-conquer algorithm that computes the Augmented Contour Tree with improved efficiency. The approach computes the output Contour Tree by merging two intermediate Contour Trees and is independent of the interpolant. In this way we confine any knowledge regarding a specific interpolant to an independent function that computes the tree for a single cell. We have implemented this function for the trilinear interpolant and plan to replace it with higher order interpolants when needed. The time complexity is O(n + t log n), where t is the number of critical points of F. For the first time we can compute the Contour Tree in linear time in many practical cases where t = O(n{sup 1-{epsilon}}). We report the running times for a parallel implementation, showing good scalability with the number of processors.

  18. Super massive black hole in galactic nuclei with tidal disruption of stars

    SciTech Connect (OSTI)

    Zhong, Shiyan; Berczik, Peter; Spurzem, Rainer

    2014-09-10

    Tidal disruption of stars by super massive central black holes from dense star clusters is modeled by high-accuracy direct N-body simulation. The time evolution of the stellar tidal disruption rate, the effect of tidal disruption on the stellar density profile, and, for the first time, the detailed origin of tidally disrupted stars are carefully examined and compared with classic papers in the field. Up to 128k particles are used in simulation to model the star cluster around a super massive black hole, and we use the particle number and the tidal radius of the black hole as free parameters for a scaling analysis. The transition from full to empty loss-cone is analyzed in our data, and the tidal disruption rate scales with the particle number, N, in the expected way for both cases. For the first time in numerical simulations (under certain conditions) we can support the concept of a critical radius of Frank and Rees, which claims that most stars are tidally accreted on highly eccentric orbits originating from regions far outside the tidal radius. Due to the consumption of stars moving on radial orbits, a velocity anisotropy is found inside the cluster. Finally we estimate the real galactic center based on our simulation results and the scaling analysis.

  19. Linking the spin evolution of massive black holes to galaxy kinematics

    SciTech Connect (OSTI)

    Sesana, A.; Barausse, E.; Dotti, M.; Rossi, E. M. E-mail: barausse@iap.fr E-mail: emr@strw.leidenuniv.nl

    2014-10-20

    We present the results of a semianalytical model that evolves the masses and spins of massive black holes together with the properties of their host galaxies across the cosmic history. As a consistency check, our model broadly reproduces a number of observations, e.g., the cosmic star formation history; the black hole mass, luminosity, and galaxy mass functions at low redshift; the black hole-bulge mass relation; and the morphological distribution at low redshift. For the first time in a semianalytical investigation, we relax the simplifying assumptions of perfect coherency or perfect isotropy of the gas fueling the black holes. The dynamics of gas is instead linked to the morphological properties of the host galaxies, resulting in different spin distributions for black holes hosted in different galaxy types. We compare our results with the observed sample of spin measurements obtained through broad K? iron line fitting. The observational data disfavor both accretion along a fixed direction and isotropic fueling. Conversely, when the properties of the accretion flow are anchored to the kinematics of the host galaxy, we obtain a good match between theoretical expectations and observations. A mixture of coherent accretion and phases of activity in which the gas dynamics is similar to that of the stars in bulges (i.e., with a significant velocity dispersion superimposed to a net rotation) best describes the data, adding further evidence in support of the coevolution of massive black holes and their hosts.

  20. The multiplicity of massive stars: A high angular resolution survey with the HST fine guidance sensor

    SciTech Connect (OSTI)

    Aldoretta, E. J.; Gies, D. R.; Henry, T. J.; Jao, W.-C.; Norris, R. P., E-mail: emily@astro.umontreal.ca, E-mail: gies@chara.gsu.edu, E-mail: thenry@chara.gsu.edu, E-mail: jao@chara.gsu.edu, E-mail: norris@chara.gsu.edu [Center for High Angular Resolution Astronomy, Department of Physics and Astronomy, Georgia State University, P. O. Box 5060, Atlanta, GA 30302-5060 (United States); and others

    2015-01-01

    We present the results of an all-sky survey made with the Fine Guidance Sensor on the Hubble Space Telescope to search for angularly resolved binary systems among massive stars. The sample of 224 stars is comprised mainly of Galactic O- and B-type stars and luminous blue variables, plus a few luminous stars in the Large Magellanic Cloud. The FGS TRANS mode observations are sensitive to the detection of companions with an angular separation between 0.?01 and 1.?0 and brighter than ?m=5. The FGS observations resolved 52 binary and 6 triple star systems and detected partially resolved binaries in 7 additional targets (43 of these are new detections). These numbers yield a companion detection frequency of 29% for the FGS survey. We also gathered literature results on the numbers of close spectroscopic binaries and wider astrometric binaries among the sample, and we present estimates of the frequency of multiple systems and the companion frequency for subsets of stars residing in clusters and associations, field stars, and runaway stars. These results confirm the high multiplicity fraction, especially among massive stars in clusters and associations. We show that the period distribution is approximately flat in increments of logP. We identify a number of systems of potential interest for long-term orbital determinations, and we note the importance of some of these companions for the interpretation of the radial velocities and light curves of close binaries that have third companions.

  1. Xyce parallel electronic simulator users' guide, Version 6.0.1.

    SciTech Connect (OSTI)

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  2. Xyce parallel electronic simulator users guide, version 6.1

    SciTech Connect (OSTI)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  3. ON THE WEAK-WIND PROBLEM IN MASSIVE STARS: X-RAY SPECTRA REVEAL A MASSIVE HOT WIND IN {mu} COLUMBAE

    SciTech Connect (OSTI)

    Huenemoerder, David P.; Oskinova, Lidia M.; Todt, Helge; Ignace, Richard; Waldron, Wayne L.; Hamaguchi, Kenji

    2012-09-10

    {mu} Columbae is a prototypical weak-wind O star for which we have obtained a high-resolution X-ray spectrum with the Chandra LETG/ACIS instrument and a low-resolution spectrum with Suzaku. This allows us, for the first time, to investigate the role of X-rays on the wind structure in a bona fide weak-wind system and to determine whether there actually is a massive hot wind. The X-ray emission measure indicates that the outflow is an order of magnitude greater than that derived from UV lines and is commensurate with the nominal wind-luminosity relationship for O stars. Therefore, the {sup w}eak-wind problem{sup -}identified from cool wind UV/optical spectra-is largely resolved by accounting for the hot wind seen in X-rays. From X-ray line profiles, Doppler shifts, and relative strengths, we find that this weak-wind star is typical of other late O dwarfs. The X-ray spectra do not suggest a magnetically confined plasma-the spectrum is soft and lines are broadened; Suzaku spectra confirm the lack of emission above 2 keV. Nor do the relative line shifts and widths suggest any wind decoupling by ions. The He-like triplets indicate that the bulk of the X-ray emission is formed rather close to the star, within five stellar radii. Our results challenge the idea that some OB stars are 'weak-wind' stars that deviate from the standard wind-luminosity relationship. The wind is not weak, but it is hot and its bulk is only detectable in X-rays.

  4. Aberrant Left Inferior Bronchial Artery Originating from the Left Gastric Artery in a Patient with Acute Massive Hemoptysis

    SciTech Connect (OSTI)

    Jiang, Sen Sun, Xi-Wen Yu, Dong Jie, Bing

    2013-10-15

    Massive hemoptysis is a life-threatening condition, and the major source of bleeding in this condition is the bronchial circulation. Bronchial artery embolization is a safe and effective treatment for controlling hemoptysis. However, the sites of origin of the bronchial arteries (BAs) have numerous anatomical variations, which can result in a technical challenge to identify a bleeding artery. We present a rare case of a left inferior BA that originated from the left gastric artery in a patient with recurrent massive hemoptysis caused by bronchiectasis. The aberrant BA was embolized, and hemoptysis has been controlled for 8 months.

  5. Unconventional minimal subtraction and Bogoliubov-Parasyuk-Hepp-Zimmermann method: Massive scalar theory and critical exponents

    SciTech Connect (OSTI)

    Carvalho, Paulo R. S.; Leite, Marcelo M.

    2013-09-15

    We introduce a simpler although unconventional minimal subtraction renormalization procedure in the case of a massive scalar ??{sup 4} theory in Euclidean space using dimensional regularization. We show that this method is very similar to its counterpart in massless field theory. In particular, the choice of using the bare mass at higher perturbative order instead of employing its tree-level counterpart eliminates all tadpole insertions at that order. As an application, we compute diagrammatically the critical exponents ? and ? at least up to two loops. We perform an explicit comparison with the Bogoliubov-Parasyuk-Hepp-Zimmermann (BPHZ) method at the same loop order, show that the proposed method requires fewer diagrams and establish a connection between the two approaches.

  6. SUPER-CRITICAL GROWTH OF MASSIVE BLACK HOLES FROM STELLAR-MASS SEEDS

    SciTech Connect (OSTI)

    Madau, Piero; Haardt, Francesco; Dotti, Massimo

    2014-04-01

    We consider super-critical accretion with angular momentum onto stellar-mass black holes as a possible mechanism for growing billion-solar-mass black holes from light seeds at early times. We use the radiatively inefficient ''slim disk'' solutionadvective, optically thick flows that generalize the standard geometrically thin disk modelto show how mildly super-Eddington intermittent accretion may significantly ease the problem of assembling the first massive black holes when the universe was less than 0.8Gyr old. Because of the low radiative efficiencies of slim disks around non-rotating as well as rapidly rotating black holes, the mass e-folding timescale in this regime is nearly independent of the spin parameter. The conditions that may lead to super-critical growth in the early universe are briefly discussed.

  7. Impurity mixing and radiation asymmetry in massive gas injection simulations of DIII-D

    SciTech Connect (OSTI)

    Izzo, V. A.

    2013-05-15

    Simulations of neon massive gas injection into DIII-D are performed with the 3D MHD code NIMROD. The poloidal and toroidal distribution of the impurity source is varied. This report will focus on the effects of the source variation on impurity mixing and radiated power asymmetry. Even toroidally symmetric impurity injection is found to produce asymmetric radiated power due to asymmetric convective heat flux produced by the 1/1 mode. When the gas source is toroidally localized, the phase relationship between the mode and the source location is important, affecting both radiation peaking and impurity mixing. Under certain circumstances, a single, localized gas jet could produce better radiation symmetry during the disruption thermal quench than evenly distributed impurities.

  8. Gravitational waves from the collision of tidally disrupted stars with massive black holes

    SciTech Connect (OSTI)

    East, William E.

    2014-11-10

    We use simulations of hydrodynamics coupled with full general relativity to investigate the gravitational waves produced by a star colliding with a massive black hole when the star's tidal disruption radius lies far outside of the black hole horizon. We consider both main-sequence and white-dwarf compaction stars, and nonspinning black holes, as well as those with near-extremal spin. We study the regime in between where the star can be accurately modeled by a point particle, and where tidal effects completely suppress the gravitational wave signal. We find that nonnegligible gravitational waves can be produced even when the star is strongly affected by tidal forces, as well as when it collides with large angular momentum. We discuss the implications that these results have for the potential observation of gravitational waves from these sources with future detectors.

  9. Analytical solutions for radiation-driven winds in massive stars. I. The fast regime

    SciTech Connect (OSTI)

    Araya, I.; Cur, M.; Cidale, L. S.

    2014-11-01

    Accurate mass-loss rate estimates are crucial keys in the study of wind properties of massive stars and for testing different evolutionary scenarios. From a theoretical point of view, this implies solving a complex set of differential equations in which the radiation field and the hydrodynamics are strongly coupled. The use of an analytical expression to represent the radiation force and the solution of the equation of motion has many advantages over numerical integrations. Therefore, in this work, we present an analytical expression as a solution of the equation of motion for radiation-driven winds in terms of the force multiplier parameters. This analytical expression is obtained by employing the line acceleration expression given by Villata and the methodology proposed by Mller and Vink. On the other hand, we find useful relationships to determine the parameters for the line acceleration given by Mller and Vink in terms of the force multiplier parameters.

  10. Analytical Expressions for the Hard-Scattering Production of Massive Partons

    SciTech Connect (OSTI)

    Wong, Cheuk-Yin

    2016-01-01

    We obtain explicit expressions for the two-particle differential cross section $E_c E_\\kappa d\\sigma (AB \\to c\\kappa X) /d\\bb c d \\bb \\kappa$ and the two-particle angular correlation function \\break $d\\sigma(AB$$ \\to$$ c\\kappa X)/d\\Delta \\phi \\, d\\Delta y$ in the hard-scattering production of massive partons in order to exhibit the ``ridge" structure on the away side in the hard-scattering process. The single-particle production cross section $d\\sigma(AB \\to cX) /dy_c c_T dc_T $ is also obtained and compared with the ALICE experimental data for charm production in $pp$ collisions at 7 TeV at LHC.

  11. Status of ParaDyn: DYNA3D for parallel computing

    SciTech Connect (OSTI)

    Goudreau, G.L.; Hoover, C.G.; DeGrout, A.J.; Raboin, P.J.

    1996-04-17

    The evolution of DYNA3D from a vector supercomputer code into a parallel code is reviewed. Current status and target applications, especially those of interest to the Department of Defense.

  12. Parallel Assembly of Large Genomes from Paired Short Reads (2010 JGI/ANL HPC Workshop)

    ScienceCinema (OSTI)

    Aluru, Srinivas [Iowa State University

    2011-06-08

    Srinivas Aluru from Iowa State University gives a presentation on "Parallel Assembly of Large Genomes from Paired Short Reads" at the JGI/Argonne HPC Workshop on January 25, 2010.

  13. penORNL: a parallel monte carlo photon and electron transport package using PENELOPE

    SciTech Connect (OSTI)

    Bekar, Kursat B.; Miller, Thomas Martin; Patton, Bruce W.; Weber, Charles F.

    2015-01-01

    The parallel Monte Carlo photon and electron transport code package penORNL was developed at Oak Ridge National Laboratory to enable advanced scanning electron microscope (SEM) simulations on high performance computing systems. This paper discusses the implementations, capabilities and parallel performance of the new code package. penORNL uses PENELOPE for its physics calculations and provides all available PENELOPE features to the users, as well as some new features including source definitions specifically developed for SEM simulations, a pulse-height tally capability for detailed simulations of gamma and x-ray detectors, and a modified interaction forcing mechanism to enable accurate energy deposition calculations. The parallel performance of penORNL was extensively tested with several model problems, and very good linear parallel scaling was observed with up to 512 processors. penORNL, along with its new features, will be available for SEM simulations upon completion of the new pulse-height tally implementation.

  14. NGC1277: A MASSIVE COMPACT RELIC GALAXY IN THE NEARBY UNIVERSE

    SciTech Connect (OSTI)

    Trujillo, Ignacio; Vazdekis, Alexandre; Balcells, Marc; Snchez-Blzquez, Patricia

    2014-01-10

    As early as 10Gyr ago, galaxies with more than 10{sup 11} M {sub ?} of stars already existed. While most of these massive galaxies must have subsequently transformed through on-going star formation and mergers with other galaxies, a small fraction (?0.1%) may have survived untouched until today. Searches for such relic galaxies, useful windows to explore the early universe, have been inconclusive to date: galaxies with masses and sizes like those observed at high redshift (M {sub *} ? 10{sup 11} M {sub ?}; R{sub e} ? 1.5kpc) have been found in the local universe, but their stars are far too young for the galaxy to be a relic galaxy. This paper explores the first case of a nearby galaxy, NGC1277 (at a distance of 73 Mpc in the Perseus galaxy cluster), which fulfills many criteria to be considered a relic galaxy. Using deep optical spectroscopy, we derive the star formation history along the structure of the galaxy: the stellar populations are uniformly old (>10Gyr) with no evidence for more recent star formation episodes. The metallicity of their stars is super-solar ([Fe/H] = 0.20 0.04 with a smooth decline toward the outer regions) and ?-enriched ([?/Fe] = 0.4 0.1). This suggests a very short formation time scale for the bulk of the stars in this galaxy. This object also rotates very fast (V {sub rot} ? 300kms{sup 1}) and has a large central velocity dispersion (? > 300kms{sup 1}). NGC1277 allows the exploration in full detail of properties such as the structure, internal dynamics, metallicity, and initial mass function as they were at ?10-12Gyr ago when the first massive galaxies were built.

  15. JELLYFISH: EVIDENCE OF EXTREME RAM-PRESSURE STRIPPING IN MASSIVE GALAXY CLUSTERS

    SciTech Connect (OSTI)

    Ebeling, H.; Stephenson, L. N.; Edge, A. C.

    2014-02-01

    Ram-pressure stripping by the gaseous intracluster medium has been proposed as the dominant physical mechanism driving the rapid evolution of galaxies in dense environments. Detailed studies of this process have, however, largely been limited to relatively modest examples affecting only the outermost gas layers of galaxies in nearby and/or low-mass galaxy clusters. We here present results from our search for extreme cases of gas-galaxy interactions in much more massive, X-ray selected clusters at z > 0.3. Using Hubble Space Telescope snapshots in the F606W and F814W passbands, we have discovered dramatic evidence of ram-pressure stripping in which copious amounts of gas are first shock compressed and then removed from galaxies falling into the cluster. Vigorous starbursts triggered by this process across the galaxy-gas interface and in the debris trail cause these galaxies to temporarily become some of the brightest cluster members in the F606W passband, capable of outshining even the Brightest Cluster Galaxy. Based on the spatial distribution and orientation of systems viewed nearly edge-on in our survey, we speculate that infall at large impact parameter gives rise to particularly long-lasting stripping events. Our sample of six spectacular examples identified in clusters from the Massive Cluster Survey, all featuring M {sub F606W} < 21 mag, doubles the number of such systems presently known at z > 0.2 and facilitates detailed quantitative studies of the most violent galaxy evolution in clusters.

  16. Fragmentation of massive dense cores down to ? 1000 AU: Relation between fragmentation and density structure

    SciTech Connect (OSTI)

    Palau, Aina; Girart, Josep M.; Estalella, Robert; Fuente, Asuncin; Fontani, Francesco; Snchez-Monge, lvaro; Commeron, Benoit; Hennebelle, Patrick; Busquet, Gemma; Bontemps, Sylvain; Zapata, Luis A.; Zhang, Qizhou; Di Francesco, James

    2014-04-10

    In order to shed light on the main physical processes controlling fragmentation of massive dense cores, we present a uniform study of the density structure of 19 massive dense cores, selected to be at similar evolutionary stages, for which their relative fragmentation level was assessed in a previous work. We inferred the density structure of the 19 cores through a simultaneous fit of the radial intensity profiles at 450 and 850 ?m (or 1.2 mm in two cases) and the spectral energy distribution, assuming spherical symmetry and that the density and temperature of the cores decrease with radius following power-laws. Even though the estimated fragmentation level is strictly speaking a lower limit, its relative value is significant and several trends could be explored with our data. We find a weak (inverse) trend of fragmentation level and density power-law index, with steeper density profiles tending to show lower fragmentation, and vice versa. In addition, we find a trend of fragmentation increasing with density within a given radius, which arises from a combination of flat density profile and high central density and is consistent with Jeans fragmentation. We considered the effects of rotational-to-gravitational energy ratio, non-thermal velocity dispersion, and turbulence mode on the density structure of the cores, and found that compressive turbulence seems to yield higher central densities. Finally, a possible explanation for the origin of cores with concentrated density profiles, which are the cores showing no fragmentation, could be related with a strong magnetic field, consistent with the outcome of radiation magnetohydrodynamic simulations.

  17. FROM DUSTY FILAMENTS TO MASSIVE STARS: THE CASE OF NGC 7538 S

    SciTech Connect (OSTI)

    Naranjo-Romero, Raul; Zapata, Luis A.; Vazquez-Semadeni, Enrique; Takahashi, Satoko; Palau, Aina; Schilke, Peter

    2012-09-20

    We report on high-sensitivity and high angular resolution archival Submillimeter Array observations of the large ({approx}15,000 AU) putative circumstellar disk associated with the O-type protostar NGC 7538 S. Observations of the continuum resolve this putative circumstellar disk into five compact sources, with sizes {approx}3000 AU and masses {approx}10 M{sub Sun }. This confirms the results of recent millimeter observations made with CARMA/BIMA toward this object. However, we find that most of these compact sources eject collimated bipolar outflows, revealed by our silicon monoxide (SiO J = 5-4) observations, and confirm that these sources have a (proto)stellar nature. All outflows are perpendicular to the large and rotating dusty structure. We propose therefore that, rather than being a single massive circumstellar disk, NGC 7538 S could instead be a large and massive contracting or rotating filament that is fragmenting at scales of 0.1-0.01 pc to form several B-type stars, via the standard process involving outflows and disks. As in recent high spatial resolution studies of dusty filaments, our observations also suggest that thermal pressure does not seem to be sufficient to support the filament, so that either additional support needs to be invoked or else the filament must be in the process of collapsing. A smoothed particle hydrodynamics numerical simulation of the formation of a molecular cloud by converging warm neutral medium flows produces contracting filaments whose dimensions and spacings between the stars forming within them, as well as their column densities, strongly resemble those observed in the filament reported here.

  18. Radiation-Hydrodynamic Simulations of Massive Star Formation with Protostellar Outflows

    SciTech Connect (OSTI)

    Cunningham, A J; Klein, R I; Krumholz, M R; McKee, C F

    2011-03-02

    We report the results of a series of AMR radiation-hydrodynamic simulations of the collapse of massive star forming clouds using the ORION code. These simulations are the first to include the feedback effects protostellar outflows, as well as protostellar radiative heating and radiation pressure exerted on the infalling, dusty gas. We find that that outflows evacuate polar cavities of reduced optical depth through the ambient core. These enhance the radiative flux in the poleward direction so that it is 1.7 to 15 times larger than that in the midplane. As a result the radiative heating and outward radiation force exerted on the protostellar disk and infalling cloud gas in the equatorial direction are greatly diminished. The simultaneously reduces the Eddington radiation pressure barrier to high-mass star formation and increases the minimum threshold surface density for radiative heating to suppress fragmentation compared to models that do not include outflows. The strength of both these effects depends on the initial core surface density. Lower surface density cores have longer free-fall times and thus massive stars formed within them undergo more Kelvin contraction as the core collapses, leading to more powerful outflows. Furthermore, in lower surface density clouds the ratio of the time required for the outflow to break out of the core to the core free-fall time is smaller, so that these clouds are consequently influenced by outflows at earlier stages of collapse. As a result, outflow effects are strongest in low surface density cores and weakest in high surface density one. We also find that radiation focusing in the direction of outflow cavities is sufficient to prevent the formation of radiation pressure-supported circumstellar gas bubbles, in contrast to models which neglect protostellar outflow feedback.

  19. Performance of a VME-based parallel processing LIDAR data acquisition system (summary)

    SciTech Connect (OSTI)

    Moore, K.; Buttler, B.; Caffrey, M.; Soriano, C.

    1995-05-01

    It may be possible to make accurate real time, autonomous, 2 and 3 dimensional wind measurements remotely with an elastic backscatter Light Detection and Ranging (LIDAR) system by incorporating digital parallel processing hardware into the data acquisition system. In this paper, we report the performance of a commercially available digital parallel processing system in implementing the maximum correlation technique for wind sensing using actual LIDAR data. Timing and numerical accuracy are benchmarked against a standard microprocessor impementation.

  20. Concurrent Collections (CnC): A new approach to parallel programming

    ScienceCinema (OSTI)

    None

    2011-10-06

    A common approach in designing parallel languages is to provide some high level handles to manipulate the use of the parallel platform. This exposes some aspects of the target platform, for example, shared vs. distributed memory. It may expose some but not all types of parallelism, for example, data parallelism but not task parallelism. This approach must find a balance between the desire to provide a simple view for the domain expert and provide sufficient power for tuning. This is hard for any given architecture and harder if the language is to apply to a range of architectures. Either simplicity or power is lost. Instead of viewing the language design problem as one of providing the programmer with high level handles, we view the problem as one of designing an interface. On one side of this interface is the programmer (domain expert) who knows the application but needs no knowledge of any aspects of the platform. On the other side of the interface is the performance expert (programmer or program) who demands maximal flexibility for optimizing the mapping to a wide range of target platforms (parallel / serial, shared / distributed, homogeneous / heterogeneous, etc.) but needs no knowledge of the domain. Concurrent Collections (CnC) is based on this separation of concerns. The talk will present CnC and its benefits. About the speaker Kathleen Knobe has focused throughout her career on parallelism especially compiler technology, runtime system design and language design. She worked at Compass (aka Massachusetts Computer Associates) from 1980 to 1991 designing compilers for a wide range of parallel platforms for Thinking Machines, MasPar, Alliant, Numerix, and several government projects. In 1991 she decided to finish her education. After graduating from MIT in 1997, she joined Digital Equipment?s Cambridge Research Lab (CRL). She stayed through the DEC/Compaq/HP mergers and when CRL was acquired and absorbed by Intel. She currently works in the Software and Services Group / Technology Pathfinding and Innovation.

  1. Parallel Botulinum Neurotoxin/A Immuno- and Enzyme Activity Assays Using

    Office of Scientific and Technical Information (OSTI)

    the Versatile RapiDx Platform. (Conference) | SciTech Connect SciTech Connect Search Results Conference: Parallel Botulinum Neurotoxin/A Immuno- and Enzyme Activity Assays Using the Versatile RapiDx Platform. Citation Details In-Document Search Title: Parallel Botulinum Neurotoxin/A Immuno- and Enzyme Activity Assays Using the Versatile RapiDx Platform. Abstract not provided. Authors: Sommer, Gregory Jon ; Wang, Ying-Chih ; Singh, Anup K. ; Hatch, Anson V. ; Ravichandran, Easwaran ; Singh,

  2. Progress on H5Part: A Portable High Performance Parallel DataInterface for

    Office of Scientific and Technical Information (OSTI)

    Electromagnetics Simulations (Conference) | SciTech Connect Progress on H5Part: A Portable High Performance Parallel DataInterface for Electromagnetics Simulations Citation Details In-Document Search Title: Progress on H5Part: A Portable High Performance Parallel DataInterface for Electromagnetics Simulations Significant problems facing all experimental andcomputationalsciences arise from growing data size and complexity. Commonto allthese problems is the need to perform efficient data I/O

  3. Portable Parallel Beam X-Ray Diffraction System | Department of Energy

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    Portable Parallel Beam X-Ray Diffraction System Portable Parallel Beam X-Ray Diffraction System New, Low Power System Reduces Energy Consumption and Improves Process Efficiency Real-time, nondestructive, in-line measurements of material properties are needed for process control in metallurgical manufacturing. With AMO support, X-Ray Optical Systems, Inc., developed the X-Beam®, a portable x-ray diffraction (XRD) system that can be used to identify structural phases, determine grain size, and

  4. Cpl6: The New Extensible, High-Performance Parallel Coupler forthe

    Office of Scientific and Technical Information (OSTI)

    Community Climate System Model (Journal Article) | SciTech Connect Cpl6: The New Extensible, High-Performance Parallel Coupler forthe Community Climate System Model Citation Details In-Document Search Title: Cpl6: The New Extensible, High-Performance Parallel Coupler forthe Community Climate System Model Coupled climate models are large, multiphysics applications designed to simulate the Earth's climate and predict the response of the climate to any changes in the forcing or boundary

  5. High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and

    Office of Scientific and Technical Information (OSTI)

    Delaunay Tessellation (Conference) | SciTech Connect SciTech Connect Search Results Conference: High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation Citation Details In-Document Search Title: High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations,

  6. A PARALLEL-PROPAGATING ALFVENIC ION-BEAM INSTABILITY IN THE HIGH-BETA SOLAR WIND

    SciTech Connect (OSTI)

    Verscharen, Daniel; Bourouaine, Sofiane; Chandran, Benjamin D. G.; Maruca, Bennett A. E-mail: s.bourouaine@unh.edu E-mail: bmaruca@ssl.berkeley.edu

    2013-08-10

    We investigate the conditions under which parallel-propagating Alfven/ion-cyclotron waves are driven unstable by an isotropic (T{sub {alpha}} = T{sub Parallel-To {alpha}}) population of alpha particles drifting parallel to the magnetic field at an average speed U{sub {alpha}} with respect to the protons. We derive an approximate analytic condition for the minimum value of U{sub {alpha}} needed to excite this instability and refine this result using numerical solutions to the hot-plasma dispersion relation. When the alpha-particle number density is {approx_equal} 5% of the proton number density and the two species have similar thermal speeds, the instability requires that {beta}{sub p} {approx}> 1, where {beta}{sub p} is the ratio of the proton pressure to the magnetic pressure. For 1 {approx}< {beta}{sub p} {approx}< 12, the minimum U{sub {alpha}} needed to excite this instability ranges from 0.7v{sub A} to 0.9v{sub A}, where v{sub A} is the Alfven speed. This threshold is smaller than the threshold of {approx_equal} 1.2v{sub A} for the parallel magnetosonic instability, which was previously thought to have the lowest threshold of the alpha-particle beam instabilities at {beta}{sub p} {approx}> 0.5. We discuss the role of the parallel Alfvenic drift instability for the evolution of the alpha-particle drift speed in the solar wind. We also analyze measurements from the Wind spacecraft's Faraday cups and show that the U{sub {alpha}} values measured in solar-wind streams with T{sub {alpha}} Almost-Equal-To T{sub Parallel-To {alpha}} are approximately bounded from above by the threshold of the parallel Alfvenic instability.

  7. THE ORIGIN OF METALS IN THE CIRCUMGALACTIC MEDIUM OF MASSIVE GALAXIES AT z = 3

    SciTech Connect (OSTI)

    Shen Sijing; Madau, Piero; Aguirre, Anthony; Guedes, Javiera [Department of Astronomy and Astrophysics, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064 (United States); Mayer, Lucio [Institute of Theoretical Physics, University of Zurich, Winterthurerstrasse 190, CH-9057 Zurich (Switzerland); Wadsley, James [Department of Physics and Astronomy, McMaster University, Main Street West, Hamilton L8S 4M1 (Canada)

    2012-11-20

    We present a detailed study of the metal-enriched circumgalactic medium (CGM) of a massive galaxy at z = 3 using results from 'ErisMC', a new cosmological hydrodynamic 'zoom-in' simulation of a disk galaxy with mass comparable to the Milky Way. The reference run adopts a blast wave scheme for supernova feedback that generates galactic outflows without explicit wind particles, a star formation recipe based on a high gas density threshold and high-temperature metal cooling. ErisMC's main progenitor at z = 3 resembles a 'Lyman break' galaxy of total mass M {sub vir} = 2.4 Multiplication-Sign 10{sup 11} M {sub Sun }, virial radius R {sub vir} = 48 kpc, and star formation rate 18 M {sub Sun} yr{sup -1}, and its metal-enriched CGM extends as far as 200 (physical) kpc from its center. Approximately 41%, 9%, and 50% of all gas-phase metals at z = 3 are locked in a hot (T > 3 Multiplication-Sign 10{sup 5} K), warm (3 Multiplication-Sign 10{sup 5} K > T > 3 Multiplication-Sign 10{sup 4} K), and cold (T < 3 Multiplication-Sign 10{sup 4} K) medium, respectively. We identify three sources of heavy elements: (1) the main host, responsible for 60% of all the metals found within 3 R {sub vir}; (2) its satellite progenitors, which shed their metals before and during infall, and are responsible for 28% of all the metals within 3 R {sub vir}, and for only 5% of those beyond 3 R {sub vir}; and (3) nearby dwarfs, which give origin to 12% of all the metals within 3 R {sub vir} and 95% of those beyond 3 R {sub vir}. Late (z < 5) galactic 'superwinds'-the result of recent star formation in ErisMC-account for only 9% of all the metals observed beyond 2 R {sub vir}, the bulk having been released at redshifts 5 {approx}< z {approx}< 8 by early star formation and outflows. In the CGM, lower overdensities are typically enriched by 'older', colder metals. Heavy elements are accreted onto ErisMC along filaments via low-metallicity cold inflows and are ejected hot via galactic outflows at a few hundred km s{sup -1}. The outflow mass-loading factor is of order unity for the main halo, but can exceed a value of 10 for nearby dwarfs. We stress that our 'zoom-in' simulation focuses on the CGM of a single massive system and cannot describe the enrichment history of the intergalactic medium as a whole by a population of galaxies with different masses and star formation histories.

  8. Star formation and cosmic massive black hole formation, a universal process organized by angular momenta

    SciTech Connect (OSTI)

    Colgate, S. A.

    2004-01-01

    It is suggested that star formation is organized following the same principles as we have applied in a recent explanation of galaxy and massive black hole formation. In this scenario angular momentum is randomly distributed by tidal torquing among condensations, Lyman-{alpha} clouds or cores for star formation during the initial non-linear phase of collapse. This angular momentum is characterized by the parameter, {lambda}, the ratio of the angular momentum of the cloud to that of a Keplerian orbit with the same central mass and radius. This parameter is calculated in very many simulations of structure formation of the universe as well as core formation and appears to be universal and independent of any scale. The specific angular momentum during the collapse of every cloud is locally conserved and universally produces a near flat rotation curve M{sub massive galactic black hole, 10{sup 8} M{sub o}, ({sup -}10{sup -3} of the galactic disk mass) or 1 M{sub o} ({sup -}0.03 of the core or of the protostellar disk mass). The inviscid collapse of a protosteller core with the same average {lambda} = 0.05 leads to the formation of a flat rotation curve (protostellar) disk of mass M{sub dsk} {sup -}30 M{sub o} of radius R{sub dsk} {approx_equal} 1100 AU or 5.4 x 10{sup -3} pc. In such a disk {Sigma} {proportional_to} 1/R and reaches the RVI condition at R{sub crit} {approx_equal} 40 AU where M{sub

  9. THE {sup 12}C + {sup 12}C REACTION AND THE IMPACT ON NUCLEOSYNTHESIS IN MASSIVE STARS

    SciTech Connect (OSTI)

    Pignatari, M.; Hirschi, R.; Bennett, M.; Wiescher, M.; Beard, M.; Gallino, R.; Fryer, C.; Rockefeller, G.; Herwig, F.; Timmes, F. X.

    2013-01-01

    Despite much effort in the past decades, the C-burning reaction rate is uncertain by several orders of magnitude, and the relative strength between the different channels {sup 12}C({sup 12}C, {alpha}){sup 20}Ne, {sup 12}C({sup 12}C, p){sup 23}Na, and {sup 12}C({sup 12}C, n){sup 23}Mg is poorly determined. Additionally, in C-burning conditions a high {sup 12}C+{sup 12}C rate may lead to lower central C-burning temperatures and to {sup 13}C({alpha}, n){sup 16}O emerging as a more dominant neutron source than {sup 22}Ne({alpha}, n){sup 25}Mg, increasing significantly the s-process production. This is due to the chain {sup 12}C(p, {gamma}){sup 13}N followed by {sup 13}N({beta} +){sup 13}C, where the photodisintegration reverse channel {sup 13}N({gamma}, p){sup 12}C is strongly decreasing with increasing temperature. Presented here is the impact of the {sup 12}C+{sup 12}C reaction uncertainties on the s-process and on explosive p-process nucleosynthesis in massive stars, including also fast rotating massive stars at low metallicity. Using various {sup 12}C+{sup 12}C rates, in particular an upper and lower rate limit of {approx}50,000 higher and {approx}20 lower than the standard rate at 5 Multiplication-Sign 10{sup 8} K, five 25 M {sub Sun} stellar models are calculated. The enhanced s-process signature due to {sup 13}C({alpha}, n){sup 16}O activation is considered, taking into account the impact of the uncertainty of all three C-burning reaction branches. Consequently, we show that the p-process abundances have an average production factor increased up to about a factor of eight compared with the standard case, efficiently producing the elusive Mo and Ru proton-rich isotopes. We also show that an s-process being driven by {sup 13}C({alpha}, n){sup 16}O is a secondary process, even though the abundance of {sup 13}C does not depend on the initial metal content. Finally, implications for the Sr-peak elements inventory in the solar system and at low metallicity are discussed.

  10. Computing contingency statistics in parallel : design trade-offs and limiting cases.

    SciTech Connect (OSTI)

    Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

    2010-06-01

    Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel. We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.

  11. Computing contingency statistics in parallel : design trade-offs and limiting cases.

    SciTech Connect (OSTI)

    Thompson, David C.; Bennett, Janine C.; Pebay, Philippe Pierre

    2010-03-01

    Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and X{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.

  12. Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; Shende, Sameer; Kassinos, Stavros C.

    2015-01-01

    This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less

  13. The MassiveBlack-II simulation: The evolution of haloes and galaxies to z ~ 0

    SciTech Connect (OSTI)

    Khandai, Nishikanta; Di Matteo, Tiziana; Croft, Rupert; Wilkins, Stephen; Feng, Yu; Tucker, Evan; DeGraf, Colin; Liu, Mao -Sheng

    2015-04-24

    We investigate the properties and clustering of halos, galaxies and blackholes to z = 0 in the high resolution hydrodynamical simulation MassiveBlack-II (MBII). MBII evolves a ?CDM cosmology in a cubical comoving volume Vbox = (100Mpc/h). It is the highest resolution simulation of this size which includes a self-consistent model for star formation, black hole accretion and associated feedback. We provide a simulation browser web application which enables interactive search and tagging of the halos, subhalos and their properties and publicly release our galaxy catalogs to the scientific community. Our analysis of the halo mass function in MBII reveals that baryons have strong effects with changes in the halo abundance of 2035% below the knee of the mass function (Mhalo 1013.2 M? h at z = 0) when compared to dark-matter-only simulations. We provide a fitting function for the halo MF out to redshift z = 11 and discuss its limitations.

  14. First results on disruption mitigation by massive gas injection in Korea Superconducting Tokamak Advanced Research

    SciTech Connect (OSTI)

    Yu Yaowei; Kim, Young-Ok; Kim, Hak-Kun; Kim, Hong-Tack; Kim, Woong-Chae; Kim, Kwang-Pyo; Son, Soo-Hyun; Bang, Eun-Nam; Hong, Suk-Ho; Yoon, Si-Woo; Zhuang Huidong; Chen Zhongyong

    2012-12-15

    Massive gas injection (MGI) system was developed on Korea Superconducting Tokamak Advanced Research (KSTAR) in 2011 campaign for disruption studies. The MGI valve has a volume of 80 ml and maximum injection pressure of 50 bar, the diameter of valve orifice to vacuum vessel is 18.4 mm, the distance between MGI valve and plasma edge is {approx}3.4 m. The MGI power supply employs a large capacitor of 1 mF with the maximum voltage of 3 kV, the valve can be opened in less than 0.1 ms, and the amount of MGI can be controlled by the imposed voltage. During KSTAR 2011 campaign, MGI disruptions are carried out by triggering MGI during the flat top of circular and limiter discharges with plasma current 400 kA and magnetic field 2-3.5 T, deuterium injection pressure 39.7 bar, and imposed voltage 1.1-1.4 kV. The results show that MGI could mitigate the heat load and prevent runaway electrons with proper MGI amount, and MGI penetration is deeper under higher amount of MGI or lower magnetic field. However, plasma start-up is difficult after some of D{sub 2} MGI disruptions due to the high deuterium retention and consequently strong outgassing of deuterium in next shot, special effort should be made to get successful plasma start-up after deuterium MGI under the graphite first wall.

  15. THE STELLAR HALOS OF MASSIVE ELLIPTICAL GALAXIES. II. DETAILED ABUNDANCE RATIOS AT LARGE RADIUS

    SciTech Connect (OSTI)

    Greene, Jenny E.; Murphy, Jeremy D.; Graves, Genevieve J.; Gunn, James E.; Raskutti, Sudhir; Comerford, Julia M.; Gebhardt, Karl

    2013-10-20

    We study the radial dependence in stellar populations of 33 nearby early-type galaxies with central stellar velocity dispersions ?{sub *} ?> 150 km s{sup 1}. We measure stellar population properties in composite spectra, and use ratios of these composites to highlight the largest spectral changes as a function of radius. Based on stellar population modeling, the typical star at 2R{sub e} is old (?10 Gyr), relatively metal-poor ([Fe/H] ? 0.5), and ?-enhanced ([Mg/Fe] ? 0.3). The stars were made rapidly at z ? 1.5-2 in shallow potential wells. Declining radial gradients in [C/Fe], which follow [Fe/H], also arise from rapid star formation timescales due to declining carbon yields from low-metallicity massive stars. In contrast, [N/Fe] remains high at large radius. Stars at large radius have different abundance ratio patterns from stars in the center of any present-day galaxy, but are similar to average Milky Way thick disk stars. Our observations are thus consistent with a picture in which the stellar outskirts are built up through minor mergers with disky galaxies whose star formation is truncated early (z ? 1.5-2)

  16. The outcome of supernovae in massive binaries; removed mass, and its separation dependence

    SciTech Connect (OSTI)

    Hirai, Ryosuke; Sawai, Hidetomo; Yamada, Shoichi [Advanced Research Institute for Science and Engineering, Waseda University, 3-4-1, Okubo, Shinjuku, Tokyo 169-8555 (Japan)

    2014-09-01

    The majority of massive stars are formed in binary systems. It is hence reasonable to expect that most core-collapse supernovae (CCSNe) take place in binaries and the existence of a companion star may leave some imprints in observed features. Having this in mind, we have conducted two-dimensional hydrodynamical simulations of the collisions of CCSNe ejecta with the companion star in an almost-equal-mass (?10 M {sub ?}) binary to find out possible consequences of such events. In particular we pay attention to the amount of mass removed and its dependence on the binary separation. In contrast to the previous surmise, we find that the companion mass is stripped not by momentum transfer but by shock heating. Up to 25% of the original mass can be removed for the closest separations and the removed mass decreases as M {sub ub}?a {sup 4.3} with the binary separation a. By performing some experimental computations with artificially modified densities of incident ejecta, we show that if the velocity of ejecta is fixed, the density of incident ejecta is the single important parameter that actually determines the removed mass as M{sub ub}??{sub ej}{sup 1.4}. On the other hand, another set of simulations with modified velocities of incident ejecta demonstrate that the strength of the forward shock, which heats up the stellar material and causes the mass loss of the companion star, is actually the key parameter for the removed mass.

  17. The MassiveBlack-II simulation: The evolution of haloes and galaxies to z ~ 0

    DOE Public Access Gateway for Energy & Science Beta (PAGES Beta)

    Khandai, Nishikanta; Di Matteo, Tiziana; Croft, Rupert; Wilkins, Stephen; Feng, Yu; Tucker, Evan; DeGraf, Colin; Liu, Mao -Sheng

    2015-04-24

    We investigate the properties and clustering of halos, galaxies and blackholes to z = 0 in the high resolution hydrodynamical simulation MassiveBlack-II (MBII). MBII evolves a ΛCDM cosmology in a cubical comoving volume Vbox = (100Mpc/h)³. It is the highest resolution simulation of this size which includes a self-consistent model for star formation, black hole accretion and associated feedback. We provide a simulation browser web application which enables interactive search and tagging of the halos, subhalos and their properties and publicly release our galaxy catalogs to the scientific community. Our analysis of the halo mass function in MBII reveals thatmore » baryons have strong effects with changes in the halo abundance of 20–35% below the knee of the mass function (Mhalo 1013.2 M⊙ h at z = 0) when compared to dark-matter-only simulations. We provide a fitting function for the halo MF out to redshift z = 11 and discuss its limitations.« less

  18. The structure of the invariant charge in massive theories with one coupling

    SciTech Connect (OSTI)

    Kraus, E.

    1995-06-01

    Invariance under finite renormalization group (RG) transformations is used to structure the invariant charge in models with one coupling in the 4 lowest orders of perturbation theory. In every order there starts a RG-invariant, which is uniquely continued to higher orders. Whereas in massless models the RG-invariants are power series in logarithms, there is no such requirement in a massive model. Only when one applies the Callan-Symanzik (CS) equation of the respective theories is the high-energy behavior of the RG-invariants restricted. In models where the CS-equation has the same form as the RG-equation, the massless limit is reached smoothly, i.e., the {beta}-functions are constants in the asymptotic limit and the RG-functions starting the new invariant tend to logarithms. On the other hand, in the spontaneously broken models with fermions the CS-equation contains a {beta}-function of a physical mass. As a consequence the {beta}-functions depend on the normalization point also in the asymptotic region and a mass independent limit does not exist anymore. {copyright} 1995 Academic Press, Inc.

  19. CHEMICAL SIGNATURE INDICATING A LACK OF MASSIVE STARS IN DWARF GALAXIES

    SciTech Connect (OSTI)

    Tsujimoto, Takuji, E-mail: taku.tsujimoto@nao.ac.jp [National Astronomical Observatory of Japan, Mitaka-shi, Tokyo 181-8588 (Japan)

    2011-08-01

    Growing evidence supports an unusual elemental feature appearing in nearby dwarf galaxies, especially dwarf spheroidals (dSphs), indicating a key process of galaxy evolution that is different from that of the Galaxy. In addition to the well-known deficiency of {alpha}-elements in dSphs, recent observations have clearly shown that s-process elements (Ba) are significantly enhanced relative to Fe, {alpha}-, and r-process elements. This enhancement occurs in some dSphs as well as in the Large Magellanic Cloud, but is unseen in the Galaxy. Here we report that this feature is evidence of the lack of very massive stars ({approx}>25 M{sub sun}) as predicted in the low star formation rate environment. We conclude that the unique elemental feature of dwarf galaxies including a low {alpha}/Fe ratio in some low-metallicity stars is, at least in some part, characterized by a different form of the initial mass function. We present a detailed model for the Fornax dSph galaxy and discuss its complex chemical enrichment history together with the nucleosynthesis site of the light s-process element Y.

  20. The parallel approach to force/position control of robotic manipulators

    SciTech Connect (OSTI)

    Chiaverini, S.; Sciavicco, L.

    1993-08-01

    Force/position control strategies provide an effective framework to deal with tasks involving interaction with the environment. In this paper the parallel approach to force/position control of robotic manipulators is presented. It allows a complete use of the available sensor measurements by operating the control action in a full-dimensional space without using selection matrices. Conflicting situations between the position and force tasks are managed using a priority strategy: the force control loop is designed to prevail over the position control loop. This choice ensures limited deviations from the prescribed force trajectory in every situation, guaranteeing automatic recovery from unplanned collisions. A dynamic force/position parallel control law is presented and its performance in presence of an elastic environment is analyzed; simplification of the dynamic control law is also discussed leading to a PID-type parallel controller. Two case studies are worked out that show the effectiveness of the approach in application to an industrial robot.

  1. Processing communications events in parallel active messaging interface by awakening thread from wait state

    DOE Patents [OSTI]

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-22

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  2. First experiments probing the collision of parallel magnetic fields using laser-produced plasmas

    SciTech Connect (OSTI)

    Rosenberg, M. J.; Li, C. K.; Fox, W.; Igumenshchev, I.; Seguin, F. H.; Town, R. P.; Frenje, J. A.; Stoeckl, C.; Glebov, V.; Petrasso, R. D.

    2015-04-08

    Novel experiments to study the strongly-driven collision of parallel magnetic fields in β~10, laser-produced plasmas have been conducted using monoenergetic proton radiography. These experiments were designed to probe the process of magnetic flux pileup, which has been identified in prior laser-plasma experiments as a key physical mechanism in the reconnection of anti-parallel magnetic fields when the reconnection inflow is dominated by strong plasma flows. In the present experiments using colliding plasmas carrying parallel magnetic fields, the magnetic flux is found to be conserved and slightly compressed in the collision region. Two-dimensional (2D) particle-in-cell (PIC) simulations predict a stronger flux compression and amplification of the magnetic field strength, and this discrepancy is attributed to the three-dimensional (3D) collision geometry. Future experiments may drive a stronger collision and further explore flux pileup in the context of the strongly-driven interaction of magnetic fields.

  3. Storing files in a parallel computing system based on user-specified parser function

    DOE Patents [OSTI]

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

    2014-10-21

    Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

  4. A mirror for lab-based quasi-monochromatic parallel x-rays

    SciTech Connect (OSTI)

    Nguyen, Thanhhai; Lu, Xun; Lee, Chang Jun; Jeon, Insu; Jung, Jin-Ho; Jin, Gye-Hwan; Kim, Sung Youb

    2014-09-15

    A multilayered parabolic mirror with six W/Al bilayers was designed and fabricated to generate monochromatic parallel x-rays using a lab-based x-ray source. Using this mirror, curved bright bands were obtained in x-ray images as reflected x-rays. The parallelism of the reflected x-rays was investigated using the shape of the bands. The intensity and monochromatic characteristics of the reflected x-rays were evaluated through measurements of the x-ray spectra in the band. High intensity, nearly monochromatic, and parallel x-rays, which can be used for high resolution x-ray microscopes and local radiation therapy systems, were obtained.

  5. Parallel-wire grid assembly with method and apparatus for construction thereof

    DOE Patents [OSTI]

    Lewandowski, E.F.; Vrabec, J.

    1981-10-26

    Disclosed is a parallel wire grid and an apparatus and method for making the same. The grid consists of a generally coplanar array of parallel spaced-apart wires secured between metallic frame members by an electrically conductive epoxy. The method consists of continuously winding a wire about a novel winding apparatus comprising a plurality of spaced-apart generally parallel spindles. Each spindle is threaded with a number of predeterminedly spaced-apart grooves which receive and accurately position the wire at predetermined positions along the spindle. Overlying frame members coated with electrically conductive epoxy are then placed on either side of the wire array and are drawn together. After the epoxy hardens, portions of the wire array lying outside the frame members are trimmed away.

  6. Parallel-wire grid assembly with method and apparatus for construction thereof

    DOE Patents [OSTI]

    Lewandowski, Edward F. (Westmont, IL); Vrabec, John (South Holland, IL)

    1984-01-01

    Disclosed is a parallel wire grid and an apparatus and method for making the same. The grid consists of a generally coplanar array of parallel spaced-apart wires secured between metallic frame members by an electrically conductive epoxy. The method consists of continuously winding a wire about a novel winding apparatus comprising a plurality of spaced-apart generally parallel spindles. Each spindle is threaded with a number of predeterminedly spaced-apart grooves which receive and accurately position the wire at predetermined positions along the spindle. Overlying frame members coated with electrically conductive epoxy are then placed on either side of the wire array and are drawn together. After the epoxy hardens, portions of the wire array lying outside the frame members are trimmed away.

  7. Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes

    DOE Patents [OSTI]

    Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; ,Caffrey, Patrick; Fier, Jeffrey

    2010-10-05

    In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.

  8. The inside-out growth of the most massive galaxies at 0.3 < z < 0.9

    SciTech Connect (OSTI)

    Bai, Lei; Yee, H. K. C.; Li, I. H.; Yan, Renbin; Lee, Eve; Gilbank, David G.; Ellingson, E.; Barrientos, L. F.; Gladders, M. D.; Hsieh, B. C.

    2014-07-10

    We study the surface brightness profiles of a sample of brightest cluster galaxies (BCGs) with 0.3 < z < 0.9. The BCGs are selected from the first Red-sequence Cluster Survey and an X-ray cluster survey. The surface brightness profiles of the BCGs are measured using HST ACS images, and the majority of them can be well modeled by a single Srsic profile with a typical Srsic index n ? 6 and a half-light radius ?30 kpc. Although the single Srsic model fits the profiles well, we argue that the systematics in the sky background measurement and the coupling between the model parameters make the comparison of the best-fit model parameters ambiguous. Direct comparison of the BCG profiles, on the other hand, has revealed an inside-out growth for these most massive galaxies: as the mass of a BCG increases, the central mass density of the galaxy increases slowly (?{sub 1kpc}?M{sub ?}{sup 0.39}), while the slope of the outer profile grows continuously shallower (?{sub r{sup 1}{sup /}{sup 4}}?M{sub ?}{sup ?2.5}). Such a fashion of growth continues down to the less massive early-type galaxies (ETGs) as a smooth function of galaxy mass, without apparent distinction between BCGs and non-BCGs. For the very massive ETGs and BCGs, the slope of the Kormendy relation starts to trace the slope of the surface brightness profiles and becomes insensitive to subtle profile evolution. These results are generally consistent with dry mergers being the major driver of the mass growth for BCGs and massive ETGs. We also find strong correlations between the richness of clusters and the properties of BCGs: the more massive the clusters are, the more massive the BCGs (M{sub bcg}{sup ?}?M{sub clusters}{sup 0.6}) and the shallower their surface brightness profiles. After taking into account the bias in the cluster samples, we find the masses of the BCGs have grown by at least a factor of 1.5 from z = 0.5 to z = 0, in contrast to the previous findings of no evolution. Such an evolution validates the expectation from the ?CDM model.

  9. Scalability of preconditioners as a strategy for parallel computation of compressible fluid flow

    SciTech Connect (OSTI)

    Hansen, G.A.

    1996-05-01

    Parallel implementations of a Newton-Krylov-Schwarz algorithm are used to solve a model problem representing low Mach number compressible fluid flow over a backward-facing step. The Mach number is specifically selected to result in a numerically {open_quote}stiff{close_quotes} matrix problem, based on an implicit finite volume discretization of the compressible 2D Navier-Stokes/energy equations using primitive variables. Newton`s method is used to linearize the discrete system, and a preconditioned Krylov projection technique is used to solve the resulting linear system. Domain decomposition enables the development of a global preconditioner via the parallel construction of contributions derived from subdomains. Formation of the global preconditioner is based upon additive and multiplicative Schwarz algorithms, with and without subdomain overlap. The degree of parallelism of this technique is further enhanced with the use of a matrix-free approximation for the Jacobian used in the Krylov technique (in this case, GMRES(k)). Of paramount interest to this study is the implementation and optimization of these techniques on parallel shared-memory hardware, namely the Cray C90 and SGI Challenge architectures. These architectures were chosen as representative and commonly available to researchers interested in the solution of problems of this type. The Newton-Krylov-Schwarz solution technique is increasingly being investigated for computational fluid dynamics (CFD) applications due to the advantages of full coupling of all variables and equations, rapid non-linear convergence, and moderate memory requirements. A parallel version of this method that scales effectively on the above architectures would be extremely attractive to practitioners, resulting in efficient, cost-effective, parallel solutions exhibiting the benefits of the solution technique.

  10. A Hybrid MPI/OpenMP Approach for Parallel Groundwater Model Calibration on Multicore Computers

    SciTech Connect (OSTI)

    Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan; Parker, Jack C.; Watson, David B; Jardine, Philip M

    2010-01-01

    Groundwater model calibration is becoming increasingly computationally time intensive. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelism in software and hardware to reduce calibration time on multicore computers with minimal parallelization effort. At first, HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for a uranium transport model with over a hundred species involving nearly a hundred reactions, and a field scale coupled flow and transport model. In the first application, a single parallelizable loop is identified to consume over 97% of the total computational time. With a few lines of OpenMP compiler directives inserted into the code, the computational time reduces about ten times on a compute node with 16 cores. The performance is further improved by selectively parallelizing a few more loops. For the field scale application, parallelizable loops in 15 of the 174 subroutines in HGC5 are identified to take more than 99% of the execution time. By adding the preconditioned conjugate gradient solver and BICGSTAB, and using a coloring scheme to separate the elements, nodes, and boundary sides, the subroutines for finite element assembly, soil property update, and boundary condition application are parallelized, resulting in a speedup of about 10 on a 16-core compute node. The Levenberg-Marquardt (LM) algorithm is added into HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, compute nodes at the number of adjustable parameters (when the forward difference is used for Jacobian approximation), or twice that number (if the center difference is used), are used to reduce the calibration time from days and weeks to a few hours for the two applications. This approach can be extended to global optimization scheme and Monte Carol analysis where thousands of compute nodes can be efficiently utilized.

  11. Parallel-plate heat pipe apparatus having a shaped wick structure

    DOE Patents [OSTI]

    Rightley, Michael J.; Adkins, Douglas R.; Mulhall, James J.; Robino, Charles V.; Reece, Mark; Smith, Paul M.; Tigges, Chris P.

    2004-12-07

    A parallel-plate heat pipe is disclosed that utilizes a plurality of evaporator regions at locations where heat sources (e.g. semiconductor chips) are to be provided. A plurality of curvilinear capillary grooves are formed on one or both major inner surfaces of the heat pipe to provide an independent flow of a liquid working fluid to the evaporator regions to optimize heat removal from different-size heat sources and to mitigate the possibility of heat-source shadowing. The parallel-plate heat pipe has applications for heat removal from high-density microelectronics and laptop computers.

  12. Parallel pulse processing and data acquisition for high speed, low error flow cytometry

    DOE Patents [OSTI]

    van den Engh, Gerrit J. (Livermore, CA); Stokdijk, Willem (Livermore, CA)

    1992-01-01

    A digitally synchronized parallel pulse processing and data acquisition system for a flow cytometer has multiple parallel input channels with independent pulse digitization and FIFO storage buffer. A trigger circuit controls the pulse digitization on all channels. After an event has been stored in each FIFO, a bus controller moves the oldest entry from each FIFO buffer onto a common data bus. The trigger circuit generates an ID number for each FIFO entry, which is checked by an error detection circuit. The system has high speed and low error rate.

  13. A transputer-based list mode parallel system for digital radiography with 2D silicon detectors

    SciTech Connect (OSTI)

    Conti, M.; Russo, P.; Scarlatella, A. . Dipt. di Scienze Fisiche and INFN); Del Guerra, A. . Dipt. di Fisica and INFN); Mazzeo, A.; Mazzocca, N.; Russo, S. . Dipt. di Informatica e Sistemistica)

    1993-08-01

    The authors believe that a dedicated parallel computer system can represent an effective and flexible approach to the problem of list mode acquisition and reconstruction of digital radiographic images obtained with a double-sided silicon microstrip detector. They present a Transputer-based implementation of a parallel system for the data acquisition and image reconstruction from a silicon crystal with 200[mu]m read-out pitch. They are currently developing a prototype of the system connected to a detector with a 10mm[sup 2] sensitive area.

  14. Parallel pulse processing and data acquisition for high speed, low error flow cytometry

    DOE Patents [OSTI]

    Engh, G.J. van den; Stokdijk, W.

    1992-09-22

    A digitally synchronized parallel pulse processing and data acquisition system for a flow cytometer has multiple parallel input channels with independent pulse digitization and FIFO storage buffer. A trigger circuit controls the pulse digitization on all channels. After an event has been stored in each FIFO, a bus controller moves the oldest entry from each FIFO buffer onto a common data bus. The trigger circuit generates an ID number for each FIFO entry, which is checked by an error detection circuit. The system has high speed and low error rate. 17 figs.

  15. Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

    Office of Scientific and Technical Information (OSTI)

    (Technical Report) | SciTech Connect Optimizing Parallel Access to the BaBar Database System Using CORBA Servers Citation Details In-Document Search Title: Optimizing Parallel Access to the BaBar Database System Using CORBA Servers The BaBar Experiment collected around 20 TB of data during its first 6 months of running. Now, after 18 months, data size exceeds 300 TB, and according to prognosis, it is a small fraction of the size of data coming in the next few months. In order to keep up with

  16. Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

    Office of Scientific and Technical Information (OSTI)

    (Technical Report) | SciTech Connect Optimizing Parallel Access to the BaBar Database System Using CORBA Servers Citation Details In-Document Search Title: Optimizing Parallel Access to the BaBar Database System Using CORBA Servers × You are accessing a document from the Department of Energy's (DOE) SciTech Connect. This site is a product of DOE's Office of Scientific and Technical Information (OSTI) and is provided as a public service. Visit OSTI to utilize additional information resources

  17. Nano-optical observation of cascade switching in a parallel superconducting nanowire single photon detector

    SciTech Connect (OSTI)

    Heath, Robert M. Tanner, Michael G.; Casaburi, Alessandro; Hadfield, Robert H.; Webster, Mark G.; San Emeterio Alvarez, Lara; Jiang, Weitao; Barber, Zoe H.; Warburton, Richard J.

    2014-02-10

    The device physics of parallel-wire superconducting nanowire single photon detectors is based on a cascade process. Using nano-optical techniques and a parallel wire device with spatially separate pixels, we explicitly demonstrate the single- and multi-photon triggering regimes. We develop a model for describing efficiency of a detector operating in the arm-trigger regime. We investigate the timing response of the detector when illuminating a single pixel and two pixels. We see a change in the active area of the detector between the two regimes and find the two-pixel trigger regime to have a faster timing response than the one-pixel regime.

  18. Dual Loop Parallel/Series Waste Heat Recovery System | Department of Energy

    Office of Energy Efficiency and Renewable Energy (EERE) Indexed Site

    Dual Loop Parallel/Series Waste Heat Recovery System Dual Loop Parallel/Series Waste Heat Recovery System This system captures all the jacket water, intercooler, and exhaust heat from the engine by utilizing a single condenser to reject leftover heat to the atmosphere. PDF icon p-04_cook.pdf More Documents & Publications Light weight and economical exhaust heat exchanger for waste heat recovery using mixed radiant and convective heat transfer CNG-Hybrid: A Practical Path to "Net Zero

  19. Method, systems, and computer program products for implementing function-parallel network firewall

    DOE Patents [OSTI]

    Fulp, Errin W. (Winston-Salem, NC); Farley, Ryan J. (Winston-Salem, NC)

    2011-10-11

    Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.

  20. Zori 1.0: A Parallel Quantum Monte Carlo Electronic StructurePackage

    Office of Scientific and Technical Information (OSTI)

    (Journal Article) | SciTech Connect SciTech Connect Search Results Journal Article: Zori 1.0: A Parallel Quantum Monte Carlo Electronic StructurePackage Citation Details In-Document Search Title: Zori 1.0: A Parallel Quantum Monte Carlo Electronic StructurePackage No abstract prepared. Authors: Aspuru-Guzik, Alan ; Salomon-Ferrer, Romelia ; Austin, Brian ; Perusquia-Flores, Raul ; Griffin, Mary A. ; Oliva, Ricardo A. ; Skinner,David ; Dominik,Domin ; Lester Jr., William A. Publication Date: