Temporal locality optimizations for stencil operations for parallel object-oriented scientific frameworks on cache-based architectures
High-performance scientific computing relies increasingly on high-level large-scale object-oriented software frameworks to manage both algorithmic complexity and the complexities of parallelism: distributed data management, process management, inter-process communication, and load balancing. This encapsulation of data management, together with the prescribed semantics of a typical fundamental component of such object-oriented frameworks--a parallel or serial array-class library--provides an opportunity for increasingly sophisticated compile-time optimization techniques. This paper describes a technique for introducing cache blocking suitable for certain classes of numerical algorithms, demonstrates and analyzes the resulting performance gains, and indicates how this optimization transformation is being automated.
- Research Organization:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE Assistant Secretary for Human Resources and Administration, Washington, DC (United States)
- DOE Contract Number:
- W-7405-ENG-36
- OSTI ID:
- 674879
- Report Number(s):
- LA-UR-98-1966; CONF-981063-; ON: DE99000938; TRN: AHC29820%%334
- Resource Relation:
- Conference: Parallel and distributed computing systems conference, Las Vegas, NV (United States), 28-31 Oct 1998; Other Information: PBD: [1998]
- Country of Publication:
- United States
- Language:
- English
Similar Records
Improving scalability with loop transformations and message aggregation in parallel object-oriented frameworks for scientific computing
Optimizing transformations of stencil operations for parallel cache-based architectures