skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: (Active) Learning on Groups of Data with Information-Theoretic Estimators

Technical Report ·
DOI:https://doi.org/10.2172/1455353· OSTI ID:1455353
 [1];  [2];  [3]
  1. Carnegie Mellon Univ., Pittsburgh, PA (United States)
  2. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
  3. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

A wide range of machine learning problems, including astronomical inference about galaxy clusters, scene classification, parametric statistical inference, and predictions of public opinion, can be well-modeled as learning a function on (samples from) distributions. This project explores problems in learning such functions via kernel methods, particularly for large-scale problems. When learning from large numbers of distributions, the computation of typical methods scales between quadratically and cubically, and so they are not amenable to large datasets. We investigate the approach of approximate embeddings into Euclidean spaces such that inner products in the embedding space approximate kernel values between the source distributions. We first improve the understanding of the workhorse methods of random Fourier features: we show that of the two approaches in common usage, one is strictly superior. We then present a new embedding for a class of information-theoretic distribution distances, and evaluate it and existing embeddings on several real-world applications.

Research Organization:
Sandia National Lab. (SNL-CA), Livermore, CA (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1455353
Report Number(s):
SAND2016-8837R; 664417
Country of Publication:
United States
Language:
English

Similar Records

Quantum Machine-Learning for Eigenstate Filtration in Two-Dimensional Materials
Journal Article · Wed Oct 27 00:00:00 EDT 2021 · Journal of the American Chemical Society · OSTI ID:1455353

Scalable and efficient learning from crowds with Gaussian processes
Journal Article · Wed Jan 02 00:00:00 EST 2019 · Information Fusion · OSTI ID:1455353

Lift & Learn: Physics-informed machine learning for large-scale nonlinear dynamical systems
Journal Article · Wed Feb 19 00:00:00 EST 2020 · Physica. D, Nonlinear Phenomena · OSTI ID:1455353

Related Subjects