DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Debugging a high performance computing program

Abstract

Methods, apparatus, and computer program products are disclosed for debugging a high performance computing program by gathering lists of addresses of calling instructions for a plurality of threads of execution of the program, assigning the threads to groups in dependence upon the addresses, and displaying the groups to identify defective threads.

Inventors:
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1150217
Patent Number(s):
8813037
Application Number:
13/780,215
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
B519700
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Gooding, Thomas M. Debugging a high performance computing program. United States: N. p., 2014. Web.
Gooding, Thomas M. Debugging a high performance computing program. United States.
Gooding, Thomas M. Tue . "Debugging a high performance computing program". United States. https://www.osti.gov/servlets/purl/1150217.
@article{osti_1150217,
title = {Debugging a high performance computing program},
author = {Gooding, Thomas M.},
abstractNote = {Methods, apparatus, and computer program products are disclosed for debugging a high performance computing program by gathering lists of addresses of calling instructions for a plurality of threads of execution of the program, assigning the threads to groups in dependence upon the addresses, and displaying the groups to identify defective threads.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Aug 19 00:00:00 EDT 2014},
month = {Tue Aug 19 00:00:00 EDT 2014}
}

Works referenced in this record:

Processing system with dual buses
patent, January 1981


Fault detection and redundancy management system
patent, January 1987


Binary tree parallel processor
patent, August 1989


Parallel computer system
patent, July 1994


Computer method for updating a network design
patent, October 1998


Integrated management of multiple networks with different topologies
patent, September 1999


Apparatus and methods for connecting modules using remote switching
patent, March 2001


Mesh protection service in a communications network
patent, January 2005


Peer-to-peer fault detection
patent, April 2005


Discovery of nodes in an interconnection fabric
patent, April 2006


Message routing in a torus interconnect
patent, July 2006


Deterministic error recovery protocol
patent, December 2006


Identifying faulty network components during a network exploration
patent, April 2007


Fault isolation through no-overhead link level CRC
patent, April 2007


Inter-working mesh telecommunications networks
patent, October 2007


Connection set-up extension for restoration path establishment in mesh networks
patent, November 2008


Network designing device and computer-readable medium
patent, March 2009


Multi-directional fault detection system
patent, March 2009


Cell boundary fault detection system
patent, May 2009


Bisectional fault detection system
patent, August 2009


Locating hardware faults in a data communications network of a parallel computer
patent, January 2010


Row fault detection system
patent, February 2010


Global tree network for computing structures
patent-application, April 2004


Optimizing layout of an application on a massively parallel supercomputer
patent-application, May 2006


Computer Hardware Fault Diagnosis
patent-application, October 2007


Executing an Allgather Operation on a Parallel Computer
patent-application, October 2007


Executing a Scatter Operation on a Parallel Computer
patent-application, October 2008


Parallel-Prefix Broadcast for a Parallel-Prefix Operation on a Parallel Computer
patent-application, October 2008


Fault recovery on a parallel computer system with a torus network
patent-application, October 2008


Link Failure Detection in a Parallel Computer
patent-application, February 2009


Performing Collective Operations in a Distributed Processing System
patent-application, March 2013


An Overview of the BlueGene/L Supercomputer
conference, January 2002