Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes

Patent ·
OSTI ID:1016148

In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.

Research Organization:
Lawrence Livermore National Security, LLC (Livermore, CA)
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
Assignee:
Lawrence Livermore National Security, LLC (Livermore, CA)
Patent Number(s):
7,810,093
Application Number:
10/989,704
OSTI ID:
1016148
Country of Publication:
United States
Language:
English

References (4)

Fast collective operations using shared and remote memory access protocols on clusters
  • Tipparaju, V.; Nieplocha, J.; Panda, D.
  • International Parallel and Distributed Processing Symposium (IPDPS 2003), Proceedings International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2003.1213188
conference January 2003
Dynamic coscheduling on workstation clusters book January 1998
Effective distributed scheduling of parallel workloads
  • Dusseau, Andrea C.; Arpaci, Remzi H.; Culler, David E.
  • Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '96 https://doi.org/10.1145/233013.233020
conference January 1996
Operating system support for parallel programming on RP3 journal September 1991

Similar Records

ATCOM: Automatically Tuned Collective Communication System for SMP Clusters
Thesis/Dissertation · Fri Dec 31 23:00:00 EST 2004 · OSTI ID:861637

Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications
Conference · Fri Dec 31 23:00:00 EST 2010 · OSTI ID:1016621

Broadcasting collective operation contributions throughout a parallel computer
Patent · Mon Feb 20 23:00:00 EST 2012 · OSTI ID:1036471

Related Subjects