skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems

Conference ·
 [1];  [1];  [2];  [3]
  1. Univ. of California, Berkeley, CA (United States)
  2. Univ. of California, Davis, CA (United States)
  3. Georgia Inst. of Technology, Atlanta, GA (United States)

Kernel Ridge Regression (KRR) is a fundamental method in machine learning. Given an n-by-d data matrix as input, a traditional implementation requires Θ(n2) memory to form an n-by-n kernel matrix and Θ(n3) flops to compute the final model. These time and storage costs prohibit KRR from scaling up to large datasets. For example, even on a relatively small dataset (a 520k-by-90 input requiring 357 MB), KRR requires 2 TB memory just to store the kernel matrix. Additionally, the reason is that n usually is much larger than d for real-world applications. On the other hand, weak scaling becomes a problem: if we keep d and n/p fixed as p grows (p is # machines), the memory needed grows as Θ(p) per processor and the flops as Θ(p2) per processor. In the perfect weak scaling situation, both the memory needed and the flops grow as Θ(1) per processor (i.e. memory and flops are constant). The traditional Distributed KRR implementation (DKRR) only achieved 0.32% weak scaling efficiency from 96 to 1536 processors. In this work, we propose two new methods to address these problems: the Balanced KRR (BKRR) and K-means KRR (KKRR). These methods consider alternative ways to partition the input dataset into p different parts, generating p different models, and then selecting the best model among them. Compared to a conventional implementation, KKRR2 (optimized version of KKRR) improves the weak scaling efficiency from 0.32% to 38% and achieves a 591x speedup for getting the same accuracy by using the same data and the same hardware (1536 processors). BKRR2 (optimized version of BKRR) achieves a higher accuracy than the current fastest method using less training time for a variety of datasets. For the applications requiring only approximate solutions, BKRR2 improves the weak scaling efficiency to 92% and achieves 3505x speedup (theoretical speedup: 4096x).

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC02-05CH11231; SC0008700
OSTI ID:
1544213
Resource Relation:
Conference: 2018 International Conference on Supercomputing, Beijing (China), 12-15 Jun 2018
Country of Publication:
United States
Language:
English

References (4)

Kernel methods in machine learning journal June 2008
Solving Eigenvalue and Singular Value Problems on an Undersized Systolic Array journal April 1986
Nonlinear Component Analysis as a Kernel Eigenvalue Problem journal July 1998
On Early Stopping in Gradient Descent Learning journal April 2007

Similar Records

Related Subjects