Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems

You, Yang; Demmel, James; Hsieh, Cho-Jui; Vuduc, Richard

doi:10.1145/3205289.3205290

Title: Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems

Conference · Tue Jun 12 00:00:00 EDT 2018

DOI:https://doi.org/10.1145/3205289.3205290· OSTI ID:1544213

You, Yang ^[1]; Demmel, James ^[1]; Hsieh, Cho-Jui ^[2]; Vuduc, Richard ^[3]

Univ. of California, Berkeley, CA (United States)
Univ. of California, Davis, CA (United States)
Georgia Inst. of Technology, Atlanta, GA (United States)

Kernel Ridge Regression (KRR) is a fundamental method in machine learning. Given an n-by-d data matrix as input, a traditional implementation requires Θ(n2) memory to form an n-by-n kernel matrix and Θ(n3) flops to compute the final model. These time and storage costs prohibit KRR from scaling up to large datasets. For example, even on a relatively small dataset (a 520k-by-90 input requiring 357 MB), KRR requires 2 TB memory just to store the kernel matrix. Additionally, the reason is that n usually is much larger than d for real-world applications. On the other hand, weak scaling becomes a problem: if we keep d and n/p fixed as p grows (p is # machines), the memory needed grows as Θ(p) per processor and the flops as Θ(p2) per processor. In the perfect weak scaling situation, both the memory needed and the flops grow as Θ(1) per processor (i.e. memory and flops are constant). The traditional Distributed KRR implementation (DKRR) only achieved 0.32% weak scaling efficiency from 96 to 1536 processors. In this work, we propose two new methods to address these problems: the Balanced KRR (BKRR) and K-means KRR (KKRR). These methods consider alternative ways to partition the input dataset into p different parts, generating p different models, and then selecting the best model among them. Compared to a conventional implementation, KKRR2 (optimized version of KKRR) improves the weak scaling efficiency from 0.32% to 38% and achieves a 591x speedup for getting the same accuracy by using the same data and the same hardware (1536 processors). BKRR2 (optimized version of BKRR) achieves a higher accuracy than the current fastest method using less training time for a variety of datasets. For the applications requiring only approximate solutions, BKRR2 improves the weak scaling efficiency to 92% and achieves 3505x speedup (theoretical speedup: 4096x).

View Conference

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

DOE Contract Number:: AC02-05CH11231; SC0008700

OSTI ID:: 1544213

Resource Relation:: Conference: 2018 International Conference on Supercomputing, Beijing (China), 12-15 Jun 2018

Country of Publication:: United States

Language:: English

References (4)

Kernel methods in machine learning Hofmann, Thomas; Schölkopf, Bernhard; Smola, Alexander J. The Annals of Statistics, Vol. 36, Issue 3 https://doi.org/10.1214/009053607000000677	journal	June 2008
Solving Eigenvalue and Singular Value Problems on an Undersized Systolic Array Schreiber, Robert SIAM Journal on Scientific and Statistical Computing, Vol. 7, Issue 2 https://doi.org/10.1137/0907029	journal	April 1986
Nonlinear Component Analysis as a Kernel Eigenvalue Problem Schölkopf, Bernhard; Smola, Alexander; Müller, Klaus-Robert Neural Computation, Vol. 10, Issue 5 https://doi.org/10.1162/089976698300017467	journal	July 1998
On Early Stopping in Gradient Descent Learning Yao, Yuan; Rosasco, Lorenzo; Caponnetto, Andrea Constructive Approximation, Vol. 26, Issue 2 https://doi.org/10.1007/s00365-006-0663-2	journal	April 2007

Similar Records

Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

Conference · Tue May 23 00:00:00 EDT 2017 · OSTI ID:1544213

Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack; +1 more

A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

Journal Article · Thu Jun 01 00:00:00 EDT 2017 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1544213

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; +6 more

Center for Technology for Advanced Scientific Componet Software (TASCS)

Technical Report · Sun Oct 31 00:00:00 EDT 2010 · OSTI ID:1544213

Govindaraju, Madhusudhan

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems

Citation Formats

References (4)

Similar Records

Related Subjects