Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

(Active) Learning on Groups of Data with Information-Theoretic Estimators

Technical Report ·
DOI:https://doi.org/10.2172/1455353· OSTI ID:1455353
 [1];  [2];  [3]
  1. Carnegie Mellon Univ., Pittsburgh, PA (United States)
  2. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
  3. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

A wide range of machine learning problems, including astronomical inference about galaxy clusters, scene classification, parametric statistical inference, and predictions of public opinion, can be well-modeled as learning a function on (samples from) distributions. This project explores problems in learning such functions via kernel methods, particularly for large-scale problems. When learning from large numbers of distributions, the computation of typical methods scales between quadratically and cubically, and so they are not amenable to large datasets. We investigate the approach of approximate embeddings into Euclidean spaces such that inner products in the embedding space approximate kernel values between the source distributions. We first improve the understanding of the workhorse methods of random Fourier features: we show that of the two approaches in common usage, one is strictly superior. We then present a new embedding for a class of information-theoretic distribution distances, and evaluate it and existing embeddings on several real-world applications.

Research Organization:
Sandia National Laboratories (SNL-CA), Livermore, CA (United States); Sandia National Laboratories, Albuquerque, NM
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1455353
Report Number(s):
SAND2016--8837R; 664417
Country of Publication:
United States
Language:
English

Similar Records

Reinforcement Learning via Gaussian Processes with Neural Network Dual Kernels
Journal Article · Sat Aug 01 00:00:00 EDT 2020 · 2020 IEEE Conference on Games (CoG) · OSTI ID:1780581

Randomized Sampling for Large Data Applications of SVM
Conference · Sat Dec 31 23:00:00 EST 2011 · OSTI ID:1059336

Learning physics-based reduced-order models from data using nonlinear manifolds
Journal Article · Tue Mar 12 00:00:00 EDT 2024 · Chaos: An Interdisciplinary Journal of Nonlinear Science · OSTI ID:2340141

Related Subjects