A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark

Gittens, Alex; Kottalam, Jey; Yang, Jiyan; Chhugani, Jatin; Racah, Evan; Singh, Mohitdeep; Yao, Yushu; Fischer, Curt; Ruebel, Oliver; Bowen, Benjamin; Krishnamurthy, Venkat; Prabhat, Mr

doi:10.1109/IPDPSW.2016.114

Title: A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark

Conference · Thu Jul 27 00:00:00 EDT 2017

DOI:https://doi.org/10.1109/IPDPSW.2016.114· OSTI ID:1372901

Gittens, Alex; Kottalam, Jey; Yang, Jiyan; Chhugani, Jatin; Racah, Evan; Singh, Mohitdeep; Yao, Yushu; Fischer, Curt; Ruebel, Oliver; Bowen, Benjamin; Krishnamurthy, Venkat; Prabhat, Mr

We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with the fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.

View Conference

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: Computational Research Division, National Energy Research Scientific Computing Division

OSTI ID:: 1372901

Report Number(s):: LBNL-1005719; ir:1005719

Country of Publication:: United States

Language:: English

Similar Records

Kira: Processing Astronomy Imagery Using Big Data Technology

Journal Article · Tue Aug 23 00:00:00 EDT 2016 · IEEE Transactions on Big Data · OSTI ID:1372901

Zhang, Zhao; Barbary, Kyle; Nothaft, Frank Austin; +5 more

Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

Conference · Thu May 12 00:00:00 EDT 2016 · OSTI ID:1372901

Gittens, Alex; Devarakonda, Aditya; Racah, Evan; +14 more

Center for Technology for Advanced Scientific Componet Software (TASCS)

Technical Report · Sun Oct 31 00:00:00 EDT 2010 · OSTI ID:1372901

Govindaraju, Madhusudhan

Title: A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark

Citation Formats

Similar Records

Related Subjects