DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes

Abstract

In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.

Inventors:
 [1];  [1];  [2];  [3];  [4];  [1]
  1. Livermore, CA
  2. Kingston, NY
  3. Austin, TX
  4. Saugerties, NY
Issue Date:
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1016148
Patent Number(s):
7810093
Application Number:
10/989,704
Assignee:
Lawrence Livermore National Security, LLC (Livermore, CA)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
W-7405-ENG-48
Resource Type:
Patent
Country of Publication:
United States
Language:
English

Citation Formats

Jones, Terry R, Watson, Pythagoras C, Tuel, William, Brenner, Larry, Caffrey, Patrick, and Fier, Jeffrey. Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes. United States: N. p., 2010. Web.
Jones, Terry R, Watson, Pythagoras C, Tuel, William, Brenner, Larry, Caffrey, Patrick, & Fier, Jeffrey. Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes. United States.
Jones, Terry R, Watson, Pythagoras C, Tuel, William, Brenner, Larry, Caffrey, Patrick, and Fier, Jeffrey. Tue . "Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes". United States. https://www.osti.gov/servlets/purl/1016148.
@article{osti_1016148,
title = {Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes},
author = {Jones, Terry R and Watson, Pythagoras C and Tuel, William and Brenner, Larry and Caffrey, Patrick and Fier, Jeffrey},
abstractNote = {In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Oct 05 00:00:00 EDT 2010},
month = {Tue Oct 05 00:00:00 EDT 2010}
}

Works referenced in this record:

Effective distributed scheduling of parallel workloads
conference, January 1996

  • Dusseau, Andrea C.; Arpaci, Remzi H.; Culler, David E.
  • Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '96
  • https://doi.org/10.1145/233013.233020

Fast collective operations using shared and remote memory access protocols on clusters
conference, January 2003

  • Tipparaju, V.; Nieplocha, J.; Panda, D.
  • International Parallel and Distributed Processing Symposium (IPDPS 2003), Proceedings International Parallel and Distributed Processing Symposium
  • https://doi.org/10.1109/IPDPS.2003.1213188

Operating system support for parallel programming on RP3
journal, September 1991


Dynamic coscheduling on workstation clusters
book, January 1998