(Active) Learning on Groups of Data with Information-Theoretic Estimators
- Carnegie Mellon Univ., Pittsburgh, PA (United States)
- Sandia National Lab. (SNL-CA), Livermore, CA (United States)
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
A wide range of machine learning problems, including astronomical inference about galaxy clusters, scene classification, parametric statistical inference, and predictions of public opinion, can be well-modeled as learning a function on (samples from) distributions. This project explores problems in learning such functions via kernel methods, particularly for large-scale problems. When learning from large numbers of distributions, the computation of typical methods scales between quadratically and cubically, and so they are not amenable to large datasets. We investigate the approach of approximate embeddings into Euclidean spaces such that inner products in the embedding space approximate kernel values between the source distributions. We first improve the understanding of the workhorse methods of random Fourier features: we show that of the two approaches in common usage, one is strictly superior. We then present a new embedding for a class of information-theoretic distribution distances, and evaluate it and existing embeddings on several real-world applications.
- Research Organization:
- Sandia National Lab. (SNL-CA), Livermore, CA (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1455353
- Report Number(s):
- SAND2016-8837R; 664417
- Country of Publication:
- United States
- Language:
- English
Similar Records
Scalable and efficient learning from crowds with Gaussian processes
Lift & Learn: Physics-informed machine learning for large-scale nonlinear dynamical systems