skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark

Abstract

We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with the fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.

Authors:
; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
Computational Research Division, National Energy Research Scientific Computing Division
OSTI Identifier:
1372901
Report Number(s):
LBNL-1005719
ir:1005719
Resource Type:
Conference
Country of Publication:
United States
Language:
English

Citation Formats

Gittens, Alex, Kottalam, Jey, Yang, Jiyan, Ringenburg, Michael, F., Chhugani, Jatin, Racah, Evan, Singh, Mohitdeep, Yao, Yushu, Fischer, Curt, Ruebel, Oliver, Bowen, Benjamin, Lewis, Norman, G., Mahoney, Michael, W., Krishnamurthy, Venkat, and Prabhat, Mr. A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark. United States: N. p., 2017. Web. doi:10.1109/IPDPSW.2016.114.
Gittens, Alex, Kottalam, Jey, Yang, Jiyan, Ringenburg, Michael, F., Chhugani, Jatin, Racah, Evan, Singh, Mohitdeep, Yao, Yushu, Fischer, Curt, Ruebel, Oliver, Bowen, Benjamin, Lewis, Norman, G., Mahoney, Michael, W., Krishnamurthy, Venkat, & Prabhat, Mr. A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark. United States. doi:10.1109/IPDPSW.2016.114.
Gittens, Alex, Kottalam, Jey, Yang, Jiyan, Ringenburg, Michael, F., Chhugani, Jatin, Racah, Evan, Singh, Mohitdeep, Yao, Yushu, Fischer, Curt, Ruebel, Oliver, Bowen, Benjamin, Lewis, Norman, G., Mahoney, Michael, W., Krishnamurthy, Venkat, and Prabhat, Mr. Thu . "A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark". United States. doi:10.1109/IPDPSW.2016.114. https://www.osti.gov/servlets/purl/1372901.
@article{osti_1372901,
title = {A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark},
author = {Gittens, Alex and Kottalam, Jey and Yang, Jiyan and Ringenburg, Michael, F. and Chhugani, Jatin and Racah, Evan and Singh, Mohitdeep and Yao, Yushu and Fischer, Curt and Ruebel, Oliver and Bowen, Benjamin and Lewis, Norman, G. and Mahoney, Michael, W. and Krishnamurthy, Venkat and Prabhat, Mr},
abstractNote = {We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with the fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.},
doi = {10.1109/IPDPSW.2016.114},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Jul 27 00:00:00 EDT 2017},
month = {Thu Jul 27 00:00:00 EDT 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: