Design and performance of a scalable, parallel statistics toolkit.
Conference
·
OSTI ID:1035311
Most statistical software packages implement a broad range of techniques but do so in an ad hoc fashion, leaving users who do not have a broad knowledge of statistics at a disadvantage since they may not understand all the implications of a given analysis or how to test the validity of results. These packages are also largely serial in nature, or target multicore architectures instead of distributed-memory systems, or provide only a small number of statistics in parallel. This paper surveys a collection of parallel implementations of statistics algorithm developed as part of a common framework over the last 3 years. The framework strategically groups modeling techniques with associated verification and validation techniques to make the underlying assumptions of the statistics more clear. Furthermore it employs a design pattern specifically targeted for distributed-memory parallelism, where architectural advances in large-scale high-performance computing have been focused. Moment-based statistics (which include descriptive, correlative, and multicorrelative statistics, principal component analysis (PCA), and k-means statistics) scale nearly linearly with the data set size and number of processes. Entropy-based statistics (which include order and contingency statistics) do not scale well when the data in question is continuous or quasi-diffuse but do scale well when the data is discrete and compact. We confirm and extend our earlier results by now establishing near-optimal scalability with up to 10,000 processes.
- Research Organization:
- Sandia National Laboratories
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1035311
- Report Number(s):
- SAND2010-8143C
- Country of Publication:
- United States
- Language:
- English
Similar Records
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
A Parallel Computing Framework for Finding the Supported Solutions to a Biobjective Network Optimization Problem
Conference
·
Mon Apr 16 00:00:00 EDT 2007
·
OSTI ID:920852
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Conference
·
Sun Dec 31 23:00:00 EST 2006
·
OSTI ID:1407083
A Parallel Computing Framework for Finding the Supported Solutions to a Biobjective Network Optimization Problem
Journal Article
·
Mon Apr 06 20:00:00 EDT 2015
· Journal of Multi-Criteria Decision Analysis
·
OSTI ID:1401385