Debugging a high performance computing program
Abstract
Methods, apparatus, and computer program products are disclosed for debugging a high performance computing program by gathering lists of addresses of calling instructions for a plurality of threads of execution of the program, assigning the threads to groups in dependence upon the addresses, and displaying the groups to identify defective threads.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1150217
- Patent Number(s):
- 8813037
- Application Number:
- 13/780,215
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B519700
- Resource Type:
- Patent
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Gooding, Thomas M. Debugging a high performance computing program. United States: N. p., 2014.
Web.
Gooding, Thomas M. Debugging a high performance computing program. United States.
Gooding, Thomas M. Tue .
"Debugging a high performance computing program". United States. https://www.osti.gov/servlets/purl/1150217.
@article{osti_1150217,
title = {Debugging a high performance computing program},
author = {Gooding, Thomas M.},
abstractNote = {Methods, apparatus, and computer program products are disclosed for debugging a high performance computing program by gathering lists of addresses of calling instructions for a plurality of threads of execution of the program, assigning the threads to groups in dependence upon the addresses, and displaying the groups to identify defective threads.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2014},
month = {8}
}
Save to My Library
You must Sign In or Create an Account in order to save documents to your library.
Works referenced in this record:
Fault detection and redundancy management system
patent, January 1987
- Julich, Paul M.; Pearce, Jeffrey B.
- US Patent Document 4,634,110
Binary tree parallel processor
patent, August 1989
- Stolfo, Salvatore J.; Miranker, Daniel Paul
- US Patent Document 4,860,201
Parallel computer system
patent, July 1994
- Douglas, David C.; Ganmukhi, Mahesh N.; Hill, Jeffrey V.
- US Patent Document 5,333,268
Torus networking method and apparatus having a switch for performing an I/O operation with an external device and changing torus size
patent, March 1998
- Hayashi, Kenichi
- US Patent Document 5,729,756
Computer method for updating a network design
patent, October 1998
- Tonelli, Daniel L.; Maloney, Kevin M.; Cronin, Kevin W.
- US Patent Document 5,821,937
Apparatus region-based detection of interference among reordered memory operations in a processor
patent, June 1999
- Moreno, Jaime; Moudgill, Mavan
- US Patent Document 5,918,005
Distributed method and system for excluding components from a restoral route in a communications network
patent, August 1999
- Croslin, William D.; Sellers, Steve; Sees, Mark W.
- US Patent Document 5,941,992
Integrated management of multiple networks with different topologies
patent, September 1999
- Wong, Norman; Bellinger, Doug; Freen, Russ
- US Patent Document 5,953,347
Method and apparatus for run-time memory access checking and memory leak detection of a multi-threaded program
patent, September 1999
- Rishi, Alok; Masamitsu, Jon A.
- US Patent Document 5,953,530
System for method for performing a context switch operation in a massively parallel computer system
patent, April 2000
- Spiller, Cynthia J.
- US Patent Document 6,047,122
Apparatus and methods for connecting modules using remote switching
patent, March 2001
- Carvey, Philip P.; Dally, William J.; Dennison, Larry R.
- US Patent Document 6,205,532
Tree network including arrangement for establishing sub-tree having a logical root below the network's physical root
patent, September 2002
- Ganmukhi, Mahesh N.; Hill, Jeffrey V.; Wong-Chan, Monica C.
- US Patent Document 6,449,667
Method of identifying low quality links in a telecommunications network
patent, November 2004
- Shah, Jasvantrai C.
- US Patent Document 6,813,240
Mesh protection service in a communications network
patent, January 2005
- Desai, Premal; Lu, Biao; Tedijanto, Theodore Ernest
- US Patent Document 6,848,062
Peer-to-peer fault detection
patent, April 2005
- Mora, Oscar; Pinate, Roger; Ponticelli, Roberto
- US Patent Document 6,880,100
Selective protection for ring topologies
patent, May 2005
- Bruckman, Leon
- US Patent Document 6,892,329
Communication network and protocol which can efficiently maintain transmission across a disrupted network
patent, June 2005
- Mahalingaiah, Rupaka
- US Patent Document 6,912,196
Routing scheme using preferred paths in a multi-path interconnection fabric in a storage network
patent, February 2006
- Lee, Whay Sing; Rettberg, Randall D.
- US Patent Document 7,007,189
Discovery of nodes in an interconnection fabric
patent, April 2006
- Lee, Whay Sing; Mortensen, Thomas M.
- US Patent Document 7,027,413
Application manager for monitoring and recovery of software based application processes
patent, April 2006
- Maso, Brian; Noy, Oded
- US Patent Document 7,028,225
Message routing in a torus interconnect
patent, July 2006
- Lee, Whay Sing; Talagala, Nisha; Chong, Jr., Fay
- US Patent Document 7,080,156
Deterministic error recovery protocol
patent, December 2006
- Blumrich, Matthew A.; Chen, Dong; Gara, Alan
- US Patent Document 7,149,920
Identifying faulty network components during a network exploration
patent, April 2007
- Bender, Carl A.; Rash, Nicholas P.
- US Patent Document 7,200,118
Fault isolation through no-overhead link level CRC
patent, April 2007
- Chen, Dong; Coteus, Paul W.; Gara, Alan
- US Patent Document 7,210,088
Inter-working mesh telecommunications networks
patent, October 2007
- Chow, Timothy; Lin, Philip J.; Mills, James D.
- US Patent Document 7,289,428
Directing a path verification request along a specific path to a mesh network switch to test operability of the specific path
patent, June 2008
- Wakumoto, Shaun; Bare, Ballard C.; Ersoy, Cetin
- US Patent Document 7,382,734
Connection set-up extension for restoration path establishment in mesh networks
patent, November 2008
- Doshi, Bharat Tarachand; Dziong, Zbigniew M.; Nagarajan, Ramesh
- US Patent Document 7,451,340
Transferring data in a parallel processing environment
patent, December 2008
- Wentzlaff, David
- US Patent Document 7,461,236
Network designing device and computer-readable medium
patent, March 2009
- Nakashima, Hisao; Hoshida, Takeshi; Akiyama, Yuichi
- US Patent Document 7,505,414
Multi-directional fault detection system
patent, March 2009
- Archer, Charles J.; Pinnow, Kurt Walter; Ratterman, Joseph D.
- US Patent Document 7,506,197
Cell boundary fault detection system
patent, May 2009
- Archer, Charles J.; Pinnow, Kurt Walter; Ratterman, Joseph D.
- US Patent Document 7,529,963
Bisectional fault detection system
patent, August 2009
- Archer, Charles J.; Pinnow, Kurt Walter; Ratterman, Joseph D.
- US Patent Document 7,571,345
Executing scatter operation to parallel computer nodes by repeatedly broadcasting content of send buffer partition corresponding to each node upon bitwise OR operation
patent, October 2009
- Archer, Charles J.; Ratterman, Joseph D.
- US Patent Document 7,600,095
Locating hardware faults in a data communications network of a parallel computer
patent, January 2010
- Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.
- US Patent Document 7,646,721
Row fault detection system
patent, February 2010
- Archer, Charles J.; Pinnow, Kurt Walter; Ratterman, Joseph D.
- US Patent Document 7,669,075
Global tree network for computing structures
patent-application, April 2004
- Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
- US Patent Document 10/469000; 20040078493
Method and system of interconnecting processors of a parallel computer to facilitate torus partitioning
patent-application, June 2005
- Stockmeyer, Larry J.
- US Patent Application 10/734340; 20050132163
Optimizing layout of an application on a massively parallel supercomputer
patent-application, May 2006
- Bhanot, Gyan V.; Gara, Alan; Heidelberger, Philip
- US Patent Application 10/963101; 20060101104
Method, system and program product for communicating among processes in a symmetric multi-processing cluster environment
patent-application, July 2007
- Jia, Bin; Treumann, Richard R.
- US Patent Application 11/282011; 20070174558
Computer Hardware Fault Diagnosis
patent-application, October 2007
- Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.
- US Patent Application 1/279573; 20070242611
Executing an Allgather Operation on a Parallel Computer
patent-application, October 2007
- Archer, Charles J.; Moreira, JOse F.; Ratterman, Joseph D.
- US Patent Application 11/279620; 20070245122
Executing a Scatter Operation on a Parallel Computer
patent-application, October 2008
- Archer, Charles J.; Ratterman, Joseph D.
- US Patent Application 11/737286; 20080263320
Parallel-Prefix Broadcast for a Parallel-Prefix Operation on a Parallel Computer
patent-application, October 2008
- Archer, Charles J.; Peters, Amanda; Ricard, Gary R.
- US Patent Application 11/737209; 20080263329
Fault recovery on a parallel computer system with a torus network
patent-application, October 2008
- Darrington, David L.; McCarthy, Patrick Joseph; Peters, Amanda
- US Patent Application 11/736923; 20080263387
Link Failure Detection in a Parallel Computer
patent-application, February 2009
- Archer, Charles J.; Blocksome, Michael A.; Megerian, Mark G.
- US Patent Application 11/832940; 20090037773
Performing Collective Operations in a Distributed Processing System
patent-application, March 2013
- Archer, Charles J.; Carey, James E.; Markland, Matthew W.
- US Patent Application 13/679133; 20130081037
An Overview of the BlueGene/L Supercomputer
conference, January 2002
- Adiga, N. R.; Almasi, G.; Almasi, G. S.
- ACM/IEEE SC 2002 Conference (SC'02)