DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing Kernel Machines Using Deep Learning

Abstract

Building highly non-linear and non-parametric models is central to several state-of-the-art machine learning systems. Kernel methods form an important class of techniques that induce a reproducing kernel Hilbert space (RKHS) for inferring non-linear models through the construction of similarity functions from data. These methods are particularly preferred in cases where the training data sizes are limited and when prior knowledge of the data similarities is available. Despite their usefulness, they are limited by the computational complexity and their inability to support end-to-end learning with a task specific objective. On the other hand, deep neural networks have become the de facto solution for end-to-end inference in several learning paradigms. In this article, we explore the idea of using deep architectures to perform kernel machine optimization, for both computational efficiency and end-to-end inferencing. To this end, we develop the DKMO (Deep Kernel Machine Optimization) framework, that creates an ensemble of dense embeddings using Nystrõm kernel approximations and utilizes deep learning to generate task-specific representations through the fusion of the embeddings. Intuitively, the filters of the network are trained to fuse information from an ensemble of linear subspaces in the RKHS. Furthermore, we introduce the kernel dropout regularization to enable improved training convergence.more » Finally, we extend this framework to the multiple kernel case, by coupling a global fusion layer with pre-trained deep kernel machines for each of the constituent kernels. Using case studies with limited training data, and lack of explicit feature sources, we demonstrate the effectiveness of our framework over conventional model inferencing techniques.« less

Authors:
ORCiD logo [1];  [2];  [3];  [1]
  1. Arizona State Univ., Tempe, AZ (United States)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  3. IBM, Yorktown Heights, NY (United States). Thomas J. Watson Research Center
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1463836
Report Number(s):
LLNL-JRNL-753878
Journal ID: ISSN 2162-237X; 896744
Grant/Contract Number:  
AC52-07NA27344
Resource Type:
Accepted Manuscript
Journal Name:
IEEE Transactions on Neural Networks and Learning Systems
Additional Journal Information:
Journal Volume: 29; Journal Issue: 11; Journal ID: ISSN 2162-237X
Publisher:
IEEE Computational Intelligence Society
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Song, Huan, Thiagarajan, Jayaraman J., Sattigeri, Prasanna, and Spanias, Andreas. Optimizing Kernel Machines Using Deep Learning. United States: N. p., 2018. Web. doi:10.1109/TNNLS.2018.2804895.
Song, Huan, Thiagarajan, Jayaraman J., Sattigeri, Prasanna, & Spanias, Andreas. Optimizing Kernel Machines Using Deep Learning. United States. https://doi.org/10.1109/TNNLS.2018.2804895
Song, Huan, Thiagarajan, Jayaraman J., Sattigeri, Prasanna, and Spanias, Andreas. Tue . "Optimizing Kernel Machines Using Deep Learning". United States. https://doi.org/10.1109/TNNLS.2018.2804895. https://www.osti.gov/servlets/purl/1463836.
@article{osti_1463836,
title = {Optimizing Kernel Machines Using Deep Learning},
author = {Song, Huan and Thiagarajan, Jayaraman J. and Sattigeri, Prasanna and Spanias, Andreas},
abstractNote = {Building highly non-linear and non-parametric models is central to several state-of-the-art machine learning systems. Kernel methods form an important class of techniques that induce a reproducing kernel Hilbert space (RKHS) for inferring non-linear models through the construction of similarity functions from data. These methods are particularly preferred in cases where the training data sizes are limited and when prior knowledge of the data similarities is available. Despite their usefulness, they are limited by the computational complexity and their inability to support end-to-end learning with a task specific objective. On the other hand, deep neural networks have become the de facto solution for end-to-end inference in several learning paradigms. In this article, we explore the idea of using deep architectures to perform kernel machine optimization, for both computational efficiency and end-to-end inferencing. To this end, we develop the DKMO (Deep Kernel Machine Optimization) framework, that creates an ensemble of dense embeddings using Nystrõm kernel approximations and utilizes deep learning to generate task-specific representations through the fusion of the embeddings. Intuitively, the filters of the network are trained to fuse information from an ensemble of linear subspaces in the RKHS. Furthermore, we introduce the kernel dropout regularization to enable improved training convergence. Finally, we extend this framework to the multiple kernel case, by coupling a global fusion layer with pre-trained deep kernel machines for each of the constituent kernels. Using case studies with limited training data, and lack of explicit feature sources, we demonstrate the effectiveness of our framework over conventional model inferencing techniques.},
doi = {10.1109/TNNLS.2018.2804895},
journal = {IEEE Transactions on Neural Networks and Learning Systems},
number = 11,
volume = 29,
place = {United States},
year = {Tue Mar 06 00:00:00 EST 2018},
month = {Tue Mar 06 00:00:00 EST 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 32 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: DKMO - Proposed approach for optimizing kernel machines using deep neural networks. For a given kernel, we generate multiple dense embeddings using kernel approximation techniques, and fuse them in a fully connected deep neural network. The architecture utilizes fully connected networks with kernel dropout regularization during the fusionmore » stage. Our approach can handle scenarios when both the feature sources and the kernel matrix are available during training or when only the kernel similarities can be accessed.« less

Save / Share: