skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Towards Millions of Communicating Threads

; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: 2016 EuroMPI Edinburgh, 09/02/16 - 09/28/16, Edinburgh, Scottland, GB
Country of Publication:
United States
MPI; Message Passing Interface; communication; concurrent execution; multi-threading; runtime system

Citation Formats

Dang, Hoang-Vu, Snir, Marc, and Gropp, William. Towards Millions of Communicating Threads. United States: N. p., 2016. Web. doi:10.1145/2966884.2966914.
Dang, Hoang-Vu, Snir, Marc, & Gropp, William. Towards Millions of Communicating Threads. United States. doi:10.1145/2966884.2966914.
Dang, Hoang-Vu, Snir, Marc, and Gropp, William. 2016. "Towards Millions of Communicating Threads". United States. doi:10.1145/2966884.2966914.
title = {Towards Millions of Communicating Threads},
author = {Dang, Hoang-Vu and Snir, Marc and Gropp, William},
abstractNote = {},
doi = {10.1145/2966884.2966914},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2016,
month = 9

Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Abstract not provided.
  • Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application--already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks.more » In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.« less
  • As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. Finally, this paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
  • Glashow's Nobel lecture is reprinted in which the author reviews the development of today's ''standard theory'' of elementary particles. (AIP)