Recent improvements in the performance of the muiltitasked TORT on time-shared Cray computers
Coarse-grained angular domain decomposition of the mesh sweep algorithm has been implemented in ORNL`s three dimensional transport code TORT for Cray`s macrotasking environment on platforms running the UNICOS operating system. A performance model constructed earlier is reviewed and its main result, namely the identification of the sources of parallelization overhead, is used to motivate the present work. The sources of overhead treated here are: redundant operations in the angular loop across participating tasks; repetitive task creation; lock utilization to prevent overwriting the flux moment arrays accumulated by the participating tasks. Substantial reduction in the parallelization overhead is demonstrated via sample runs with fixed tunning, i.e. zero CPU hold time. Up to 50% improvement in the wall clock speedup over the previous implementation with autotunning is observed in some test problems.
- Research Organization:
- Oak Ridge National Lab., TN (United States)
- Sponsoring Organization:
- USDOE, Washington, DC (United States)
- DOE Contract Number:
- AC05-96OR22464
- OSTI ID:
- 425353
- Report Number(s):
- CONF-961245--3; ON: DE97001732
- Country of Publication:
- United States
- Language:
- English
Similar Records
Parallel performance of TORT on the CRAY J90: Model and measurement
Multitasking TORT under UNICOS: Parallel performance models and measurements