Recent improvements in the performance of the muiltitasked TORT on time-shared Cray computers
Coarse-grained angular domain decomposition of the mesh sweep algorithm has been implemented in ORNL`s three dimensional transport code TORT for Cray`s macrotasking environment on platforms running the UNICOS operating system. A performance model constructed earlier is reviewed and its main result, namely the identification of the sources of parallelization overhead, is used to motivate the present work. The sources of overhead treated here are: redundant operations in the angular loop across participating tasks; repetitive task creation; lock utilization to prevent overwriting the flux moment arrays accumulated by the participating tasks. Substantial reduction in the parallelization overhead is demonstrated via sample runs with fixed tunning, i.e. zero CPU hold time. Up to 50% improvement in the wall clock speedup over the previous implementation with autotunning is observed in some test problems.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE, Washington, DC (United States)
- DOE Contract Number:
- AC05-96OR22464
- OSTI ID:
- 425353
- Report Number(s):
- CONF-961245-3; ON: DE97001732; TRN: 97:003072
- Resource Relation:
- Conference: Seminar on 3D deterministic radiation transport computer programs, features, applications and perspectives, Paris (France), 2-3 Dec 1996; Other Information: PBD: [1996]
- Country of Publication:
- United States
- Language:
- English
Similar Records
Macrotasking the singular value decomposition of block circulant matrices on the Cray-2
Parallel performance of TORT on the CRAY J90: Model and measurement