Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

Hero, Alfred O.; Rajaratnam, Bala

doi:10.1109/JPROC.2015.2494178

Title: Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

Journal Article · Wed Dec 09 00:00:00 EST 2015 · Proceedings of the IEEE

DOI:https://doi.org/10.1109/JPROC.2015.2494178· OSTI ID:1367662

Hero, Alfred O. ^[1]; Rajaratnam, Bala ^[2]

Univ. of Michigan, Ann Arbor, MI (United States)
Stanford Univ., CA (United States)

When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Univ. of Michigan, Ann Arbor, MI (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA); National Institutes of Health (NIH); National Science Foundation (NSF); US Army Research Office (ARO)

Grant/Contract Number:: NA0002534; FA9550-13-1-0043; W911NF-11-1-0391; W911NF-12-1-0443; 2P01CA087634-06A2; DMS-0906392; DMS-CMG1025465; AGS-1003823; DMS-1106642; DMS-CAREER-1352656; DARPA-YFAN66001-111-4131

OSTI ID:: 1367662

Journal Information:: Proceedings of the IEEE, Vol. 104, Issue 1; ISSN 0018-9219

Publisher:: Institute of Electrical and Electronics EngineersCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 13 works

Citation information provided by
Web of Science

Similar Records

A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees

Journal Article · Fri Sep 26 00:00:00 EDT 2014 · Journal of the Royal Statistical Society: Series B (Statistical Methodology) · OSTI ID:1367662

Khare, Kshitij; Oh, Sang-Yun; Rajaratnam, Bala

Statistical theory of nuclear reactions and the Gaussian orthogonal ensemble

Journal Article · Thu Nov 01 00:00:00 EST 1984 · Ann. Phys. (N.Y.); (United States) · OSTI ID:1367662

Weidenmueller, H A

Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications

Journal Article · Wed Feb 03 00:00:00 EST 2021 · Journal of Computational Physics · OSTI ID:1367662

Wang, Yating; Deng, Wei; Lin, Guang

Related Subjects

97 MATHEMATICS AND COMPUTING
Asymptotic regimes
big data
correlation estima-tion
correlation mining
correlation screening
correlation selection
graphical models
large-scale inference
purely high dimensional
sample complexity
triple asymptotic framework
unifying learning theory

Title: Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

Citation Formats

Similar Records

Related Subjects