Approximate l-fold cross-validation with Least Squares SVM and Kernel Ridge Regression
- ORNL
Kernel methods have difficulties scaling to large modern data sets. The scalability issues are based on computational and memory requirements for working with a large matrix. These requirements have been addressed over the years by using low-rank kernel approximations or by improving the solvers scalability. However, Least Squares Support VectorMachines (LS-SVM), a popular SVM variant, and Kernel Ridge Regression still have several scalability issues. In particular, the O(n^3) computational complexity for solving a single model, and the overall computational complexity associated with tuning hyperparameters are still major problems. We address these problems by introducing an O(n log n) approximate l-fold cross-validation method that uses a multi-level circulant matrix to approximate the kernel. In addition, we prove our algorithm s computational complexity and present empirical runtimes on data sets with approximately 1 million data points. We also validate our approximate method s effectiveness at selecting hyperparameters on real world and standard benchmark data sets. Lastly, we provide experimental results on using a multi-level circulant kernel approximation to solve LS-SVM problems with hyperparameters selected using our method.
- Research Organization:
- Oak Ridge National Laboratory (ORNL)
- Sponsoring Organization:
- EE USDOE - Office of Energy Efficiency and Renewable Energy (EE)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1111451
- Country of Publication:
- United States
- Language:
- English
Similar Records
Randomized Sampling for Large Data Applications of SVM
A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression