Mini-apps for high performance data analysis
Scaling-up scientific data analysis and machine learning algorithms for data-driven discovery is a grand challenge that we face today. Despite the growing need for analysis from science domains that are generating `Big Data' from instruments and simulations, building high-performance analytical workflows of data-intensive algorithms have been daunting because: (i) the `Big Data' hardware and software architecture landscape is constantly evolving, (ii) newer architectures impose new programming models, and (iii) data-parallel kernels of analysis algorithms and their performance facets on different architectures are poorly understood. To address these problems, we have: (i) identified scalable data-parallel kernels of popular data analysis algorithms, (ii) implemented `Mini-Apps' of those kernels using different programming models (e.g. Map Reduce, MPI, etc.), (iii) benchmarked and validated the performance of the kernels in diverse architectures. In this paper, we discuss two of those Mini-Apps and show the execution of principal component analysis built as a workflow of the Mini-Apps. We show that Mini-Apps enable scientists to (i) write domain-specific data analysis code that scales on most HPC hardware and (ii) and offers the ability (most times with over a 10x speed-up) to analyze data sizes 100 times the size of what off-the-shelf desktop/workstations of today can handle.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
- Sponsoring Organization:
- USDOE Office of Science; USDOE
- OSTI ID:
- 1567561
- Country of Publication:
- United States
- Language:
- English
Similar Records
On-demand data analytics in HPC environments at leadership computing facilities: Challenges and experiences
Review of Literature Related to Nuclear Data Mini-Apps
Scaling and performance portability of the particle-in-cell scheme for plasma physics applications through mini-apps targeting exascale architectures
Journal Article
·
Wed Nov 30 23:00:00 EST 2016
·
OSTI ID:1567562
Review of Literature Related to Nuclear Data Mini-Apps
Technical Report
·
Sun Feb 28 23:00:00 EST 2021
·
OSTI ID:1773243
Scaling and performance portability of the particle-in-cell scheme for plasma physics applications through mini-apps targeting exascale architectures
Conference
·
Thu Feb 29 23:00:00 EST 2024
·
OSTI ID:2438748