skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: SpaRC: scalable sequence clustering using Apache Spark

Authors:
 [1];  [2];  [3];  [1]; ORCiD logo [4];
  1. Department of Computer Science, School of Computer Science, Florida State University, Tallahassee, FL, USA
  2. US Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
  3. Pacific Biosciences Inc, Menlo Park, CA, USA
  4. US Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; School of Natural Sciences, University of California at Merced, Merced, CA, USA
Publication Date:
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
OSTI Identifier:
1471135
Grant/Contract Number:  
[AC02-05CH11231]
Resource Type:
Publisher's Accepted Manuscript
Journal Name:
Bioinformatics
Additional Journal Information:
[Journal Name: Bioinformatics Journal Volume: 35 Journal Issue: 5]; Journal ID: ISSN 1367-4803
Publisher:
Oxford University Press
Country of Publication:
United Kingdom
Language:
English

Citation Formats

Shi, Lizhen, Meng, Xiandong, Tseng, Elizabeth, Mascagni, Michael, Wang, Zhong, and Birol, Inanc. SpaRC: scalable sequence clustering using Apache Spark. United Kingdom: N. p., 2018. Web. doi:10.1093/bioinformatics/bty733.
Shi, Lizhen, Meng, Xiandong, Tseng, Elizabeth, Mascagni, Michael, Wang, Zhong, & Birol, Inanc. SpaRC: scalable sequence clustering using Apache Spark. United Kingdom. doi:10.1093/bioinformatics/bty733.
Shi, Lizhen, Meng, Xiandong, Tseng, Elizabeth, Mascagni, Michael, Wang, Zhong, and Birol, Inanc. Thu . "SpaRC: scalable sequence clustering using Apache Spark". United Kingdom. doi:10.1093/bioinformatics/bty733.
@article{osti_1471135,
title = {SpaRC: scalable sequence clustering using Apache Spark},
author = {Shi, Lizhen and Meng, Xiandong and Tseng, Elizabeth and Mascagni, Michael and Wang, Zhong and Birol, Inanc},
abstractNote = {},
doi = {10.1093/bioinformatics/bty733},
journal = {Bioinformatics},
number = [5],
volume = [35],
place = {United Kingdom},
year = {2018},
month = {8}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
DOI: 10.1093/bioinformatics/bty733

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Accurate and comprehensive sequencing of personal genomes
journal, July 2011

  • Ajay, S. S.; Parker, S. C. J.; Ozel Abaan, H.
  • Genome Research, Vol. 21, Issue 9
  • DOI: 10.1101/gr.123638.111

A framework for space-efficient read clustering in metagenomic samples
journal, March 2017


Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning
journal, September 2015

  • Cleary, Brian; Brito, Ilana Lauren; Huang, Katherine
  • Nature Biotechnology, Vol. 33, Issue 10
  • DOI: 10.1038/nbt.3329

SparkBLAST: scalable BLAST processing using in-memory operations
journal, June 2017

  • de Castro, Marcelo Rodrigo; Tostes, Catherine dos Santos; Dávila, Alberto M. R.
  • BMC Bioinformatics, Vol. 18, Issue 1
  • DOI: 10.1186/s12859-017-1723-8

KMC 2: fast and resource-frugal k-mer counting
journal, January 2015


Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing
journal, July 2015


DIME: A Novel Framework for De Novo Metagenomic Sequence Assembly
journal, February 2015

  • Guo, Xuan; Yu, Ning; Ding, Xiaojun
  • Journal of Computational Biology, Vol. 22, Issue 2
  • DOI: 10.1089/cmb.2014.0251

Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen
journal, January 2011


Tackling soil diversity with the assembly of large, complex metagenomes
journal, March 2014

  • Howe, Adina Chuang; Jansson, Janet K.; Malfatti, Stephanie A.
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 13
  • DOI: 10.1073/pnas.1402564111

Counting the Uncountable: Statistical Approaches to Estimating Microbial Diversity
journal, October 2001


Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark
journal, September 2016


MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
journal, January 2015


Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
journal, March 2016


A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
journal, January 2011


Next-generation transcriptome assembly
journal, September 2011

  • Martin, Jeffrey A.; Wang, Zhong
  • Nature Reviews Genetics, Vol. 12, Issue 10
  • DOI: 10.1038/nrg3068

A near complete snapshot of the Zea mays seedling transcriptome revealed from ultra-deep sequencing
journal, March 2014

  • Martin, Jeffrey A.; Johnson, Nicole V.; Gross, Stephen M.
  • Scientific Reports, Vol. 4, Issue 1
  • DOI: 10.1038/srep04519

Assembly algorithms for next-generation sequencing data
journal, June 2010


metaSPAdes: a new versatile metagenomic assembler
journal, March 2017

  • Nurk, Sergey; Meleshko, Dmitry; Korobeynikov, Anton
  • Genome Research, Vol. 27, Issue 5
  • DOI: 10.1101/gr.213959.116

Near linear time algorithm to detect community structures in large-scale networks
journal, September 2007


DSK: k-mer counting with very low memory usage
journal, January 2013


Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software
journal, October 2017

  • Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter
  • Nature Methods, Vol. 14, Issue 11
  • DOI: 10.1038/nmeth.4458

A case study of tuning MapReduce for efficient Bioinformatics in the cloud
journal, January 2017


Methane yield phenotypes linked to differential gene expression in the sheep rumen microbiome
journal, June 2014

  • Shi, Weibing; Moon, Christina D.; Leahy, Sinead C.
  • Genome Research, Vol. 24, Issue 9
  • DOI: 10.1101/gr.168245.113

Next generation sequencing data of a defined microbial mock community
journal, September 2016

  • Singer, Esther; Andreopoulos, Bill; Bowers, Robert M.
  • Scientific Data, Vol. 3, Issue 1
  • DOI: 10.1038/sdata.2016.81

Structure and function of the global ocean microbiome
journal, May 2015


Metagenomics: DNA sequencing of environmental samples
journal, October 2005

  • Tringe, Susannah Green; Rubin, Edward M.
  • Nature Reviews Genetics, Vol. 6, Issue 11
  • DOI: 10.1038/nrg1709

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
journal, September 2012