Efficient Extraction of Regional Subsets from Massive Climate Datasets using Parallel IO
The size of datasets produced by current climate models is increasing rapidly to the scale of petabytes. To handle data at this scale parallel analysis tools are required, however the majority of climate analysis software remains at the scale of workstations. Further, many climate analysis tools adequately process regularly gridded data but lack sufficient features when handling unstructured grids. This paper presents a data-parallel subsetter capable of correctly handling unstructured grids while scaling to over 2000 cores. The approach is based on the partitioned global address space (PGAS) parallel programming model and one-sided communication. The paper demonstrates that IO remains the single greatest bottleneck for this domain of applications and that parallel analysis of climate data succeeds in practice.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1000798
- Report Number(s):
- PNNL-SA-71307; KJ0403000; TRN: US201101%%554
- Resource Relation:
- Conference: American Geophysical Union, Fall Meeting 2010, Paper No. IN41A-1360
- Country of Publication:
- United States
- Language:
- English
Similar Records
Efficient data IO for a Parallel Global Cloud Resolving Model
A Bloom Filter Based Scalable Data Integrity Check Tool for Large-Scale Dataset