(Active) Learning on Groups of Data with Information-Theoretic Estimators

Sutherland, Dougal; Kegelmeyer, W. Philip; Hutchinson, Robert L.

doi:10.2172/1455353

Title: (Active) Learning on Groups of Data with Information-Theoretic Estimators

Technical Report · Thu Sep 01 00:00:00 EDT 2016

DOI:https://doi.org/10.2172/1455353· OSTI ID:1455353

Sutherland, Dougal ^[1]; Kegelmeyer, W. Philip ^[2]; Hutchinson, Robert L. ^[3]

Carnegie Mellon Univ., Pittsburgh, PA (United States)
Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

A wide range of machine learning problems, including astronomical inference about galaxy clusters, scene classification, parametric statistical inference, and predictions of public opinion, can be well-modeled as learning a function on (samples from) distributions. This project explores problems in learning such functions via kernel methods, particularly for large-scale problems. When learning from large numbers of distributions, the computation of typical methods scales between quadratically and cubically, and so they are not amenable to large datasets. We investigate the approach of approximate embeddings into Euclidean spaces such that inner products in the embedding space approximate kernel values between the source distributions. We first improve the understanding of the workhorse methods of random Fourier features: we show that of the two approaches in common usage, one is strictly superior. We then present a new embedding for a class of information-theoretic distribution distances, and evaluate it and existing embeddings on several real-world applications.

View Technical Report

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-CA), Livermore, CA (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 1455353

Report Number(s):: SAND2016-8837R; 664417

Country of Publication:: United States

Language:: English

Similar Records

Quantum Machine-Learning for Eigenstate Filtration in Two-Dimensional Materials

Journal Article · Wed Oct 27 00:00:00 EDT 2021 · Journal of the American Chemical Society · OSTI ID:1455353

Sajjan, Manas; Sureshbabu, Shree Hari; Kais, Sabre

Scalable and efficient learning from crowds with Gaussian processes

Journal Article · Wed Jan 02 00:00:00 EST 2019 · Information Fusion · OSTI ID:1455353

Morales-Álvarez, Pablo; Ruiz, Pablo; Santos-Rodríguez, Raúl; +2 more

Lift & Learn: Physics-informed machine learning for large-scale nonlinear dynamical systems

Journal Article · Wed Feb 19 00:00:00 EST 2020 · Physica. D, Nonlinear Phenomena · OSTI ID:1455353

Qian, Elizabeth; Kramer, Boris; Peherstorfer, Benjamin; +1 more

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: (Active) Learning on Groups of Data with Information-Theoretic Estimators

Citation Formats

Similar Records

Related Subjects