skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Extending catamount for multi-core, processors.


No abstract prepared.

; ;
Publication Date:
Research Org.:
Sandia National Laboratories
Sponsoring Org.:
OSTI Identifier:
Report Number(s):
TRN: US200722%%784
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: Proposed for presentation at the Cray Users Group held May 9, 2007 in Seattle, WA.
Country of Publication:
United States

Citation Formats

Kelly, Suzanne Marie, VanDyke, John P., and Vaughan, Courtenay Thomas. Extending catamount for multi-core, processors.. United States: N. p., 2007. Web.
Kelly, Suzanne Marie, VanDyke, John P., & Vaughan, Courtenay Thomas. Extending catamount for multi-core, processors.. United States.
Kelly, Suzanne Marie, VanDyke, John P., and Vaughan, Courtenay Thomas. Tue . "Extending catamount for multi-core, processors.". United States. doi:.
title = {Extending catamount for multi-core, processors.},
author = {Kelly, Suzanne Marie and VanDyke, John P. and Vaughan, Courtenay Thomas},
abstractNote = {No abstract prepared.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue May 01 00:00:00 EDT 2007},
month = {Tue May 01 00:00:00 EDT 2007}

Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • No abstract prepared.
  • Abstract not provided.
  • Biological processes occurring inside cell involve multiple scales of time and length; many popular theoretical and computational multi-scale techniques utilize biomolecular simulations based on molecular dynamics. Till recently, the computing power required for simulating the relevant scales was even beyond the reach of fastest supercomputers. The availability of petaFLOPS-scale computing power in near future holds great promise. Unfortunately, the bio-simulations software technology has not kept up with the changes in hardware. In particular, with the introduction of multi-core processing technologies in systems with tens of thousands of processing cores, it is unclear whether the existing biomolecular simulation frameworks will bemore » able to scale and to utilize these resources effectively. While the multicore processing systems provide higher processing capabilities, their memory and IO subsystems are posing new challenges to application and system software developers. In this preliminary study, we attempt to characterize computation, communication and memory efficiencies of bio-molecular simulations on a Cray XT3 system, which has recently been upgraded to dual-core Opteron processors. We identify that the application efficiencies using the multi-core processors reduce with the increase of the simulated system size. Further, we measure the communication overhead of using both cores in the processor simultaneously and identify that the MPI communication performance can be as low as 50% as compared to the single-core execution times. We conclude that not only the biomolecular simulations need to be aware of the underlying multi-core hardware in order to achieve maximum performance but also the system software needs to provide processor and memory placement features in the high-end systems. Our results on a stand-alone dual-core AMD system confirm that combinations of processor and memory affinity schemes can result in over 12% performance gains.« less
  • No abstract prepared.
  • Numerous applications require the exploration of large graphs. The problem has been tackled in the past through a variety of solutions, either based on commodity processors or dedicated hardware. Processors based on multiple cores, like the Cell Broadband Engine (CBE), are gaining popularity as basic building blocks for high performance clusters. Nevertheless, no studies have still investigated how effectively the CBE architecture can explore large graphs, and how its performance compares with other architectural solutions. In this paper, we describe the challenges and design choices involved in mapping a breadth-first search (BFS) algorithm on the CBE. Our implementation has beenmore » driven by an accurate performance model, that has allowed seamless coordination between onchip communication, off-chip memory access, and computation. Preliminary results obtained on a pre-production prototype running at 2.4 GHz show almost linear speedups when using multiple synergistic processing units and impressive levels of performance when compared to other processors.« less