skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Computing Derivatives for UQ on Emerging Manycore Architectures.

Abstract

Abstract not provided.

Authors:
;
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1427064
Report Number(s):
SAND2017-1980C
651395
DOE Contract Number:
AC04-94AL85000
Resource Type:
Conference
Resource Relation:
Conference: Proposed for presentation at the SIAM CSE 2017 held February 27 - March 3, 2017 in Atlanta, GA.
Country of Publication:
United States
Language:
English

Citation Formats

Phipps, Eric T., and Edwards, Harold C. Computing Derivatives for UQ on Emerging Manycore Architectures.. United States: N. p., 2017. Web.
Phipps, Eric T., & Edwards, Harold C. Computing Derivatives for UQ on Emerging Manycore Architectures.. United States.
Phipps, Eric T., and Edwards, Harold C. Wed . "Computing Derivatives for UQ on Emerging Manycore Architectures.". United States. doi:. https://www.osti.gov/servlets/purl/1427064.
@article{osti_1427064,
title = {Computing Derivatives for UQ on Emerging Manycore Architectures.},
author = {Phipps, Eric T. and Edwards, Harold C.},
abstractNote = {Abstract not provided.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Feb 01 00:00:00 EST 2017},
month = {Wed Feb 01 00:00:00 EST 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Abstract not provided.
  • Abstract not provided.
  • Graph matching is a prototypical combinatorial problem with many applications in computer science and scientific computing, but algorithms for computing optimal matchings are challenging to parallelize. Approximate matching algorithms provide an alternate route for parallelization, and in many contexts compute near-optimal matchings for large-scale graphs. We present sharedmemory parallel implementations for computing half-approximate weighted matching on state-of-the-art multicore (Intel Nehalem and AMD Magny-Cours), manycore (Nvidia Tesla and Nvidia Fermi) and massively multithreaded (Cray XMT) platforms. We provide two implementations: the first implementation uses shared work queues, and is suited to all these platforms; the second implementation is based on dataflowmore » principles, and exploits the architectural features of the Cray XMT. Using a carefully chosen dataset that exhibits characteristics from a wide range of real-world applications, we show scalable performance across different platforms. In particular, for one instance of the input, an R-MAT graph (RMAT-G), we show speedups of: about 32 on 48 cores of an AMD Magny-Cours; 7 on 8 cores of Intel Nehalem; 3 on Nvidia Tesla and 10 on Nvidia Fermi relative to one core of Intel Nehalem; and 60 on 128 processors of Cray XMT. We demonstrate good weak and strong scaling for graphs with up to a billion edges using up to 12, 800 threads. Given the breadth of this work, we focus on simplicity and portability of software rather than excessive fine-tuning for each platform. To the best of our knowledge, this is the first such large-scale study of the half-approximate weighted matching problem on shared-memory platforms. Driven by the critical enabling role of combinatorial algorithms such as matching in scientific computing and the emergence of informatics applications, there is a growing demand to support irregular computations on current and future computing platforms. In this context, we evaluate the capability of emerging multithreaded platforms to tolerate latency induced by irregular memory access patterns, and to support fine-grained parallelism via light-weight synchronization mechanisms. By contrasting the architectural features of these platforms against the Cray XMT, which is specifically designed to support irregular memory-intensive applications, we delineate the impact of these choices on performance.« less