skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Computational Strategies for Scalable Genomics Analysis

Abstract

The revolution in next-generation DNA sequencing technologies is leading to explosive data growth in genomics, posing a significant challenge to the computing infrastructure and software algorithms for genomics analysis. Various big data technologies have been explored to scale up/out current bioinformatics solutions to mine the big genomics data. In this review, we survey some of these exciting developments in the applications of parallel distributed computing and special hardware to genomics. We comment on the pros and cons of each strategy in the context of ease of development, robustness, scalability, and efficiency. Although this review is written for an audience from the genomics and bioinformatics fields, it may also be informative for the audience of computer science with interests in genomics applications.

Authors:
ORCiD logo [1]; ORCiD logo [2]
  1. Florida State Univ., Tallahassee, FL (United States)
  2. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Merced, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1599823
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Genes
Additional Journal Information:
Journal Volume: 10; Journal Issue: 12; Journal ID: ISSN 2073-4425
Publisher:
MDPI
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; scalable genomics analysis; big data; high performance computing; cloud computing

Citation Formats

Shi, Lizhen, and Wang, Zhong. Computational Strategies for Scalable Genomics Analysis. United States: N. p., 2019. Web. doi:10.3390/genes10121017.
Shi, Lizhen, & Wang, Zhong. Computational Strategies for Scalable Genomics Analysis. United States. doi:10.3390/genes10121017.
Shi, Lizhen, and Wang, Zhong. Fri . "Computational Strategies for Scalable Genomics Analysis". United States. doi:10.3390/genes10121017. https://www.osti.gov/servlets/purl/1599823.
@article{osti_1599823,
title = {Computational Strategies for Scalable Genomics Analysis},
author = {Shi, Lizhen and Wang, Zhong},
abstractNote = {The revolution in next-generation DNA sequencing technologies is leading to explosive data growth in genomics, posing a significant challenge to the computing infrastructure and software algorithms for genomics analysis. Various big data technologies have been explored to scale up/out current bioinformatics solutions to mine the big genomics data. In this review, we survey some of these exciting developments in the applications of parallel distributed computing and special hardware to genomics. We comment on the pros and cons of each strategy in the context of ease of development, robustness, scalability, and efficiency. Although this review is written for an audience from the genomics and bioinformatics fields, it may also be informative for the audience of computer science with interests in genomics applications.},
doi = {10.3390/genes10121017},
journal = {Genes},
number = 12,
volume = 10,
place = {United States},
year = {2019},
month = {12}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

MapReduce: simplified data processing on large clusters
journal, January 2008

  • Dean, Jeffrey; Ghemawat, Sanjay; Mehta, Brijesh
  • Communications of the ACM, Vol. 51, Issue 1
  • DOI: 10.1145/1327452.1327492

SPFP: Speed without compromise—A mixed precision model for GPU accelerated molecular dynamics simulations
journal, February 2013

  • Le Grand, Scott; Götz, Andreas W.; Walker, Ross C.
  • Computer Physics Communications, Vol. 184, Issue 2
  • DOI: 10.1016/j.cpc.2012.09.022

An OpenMP-based tool for finding longest common subsequence in bioinformatics
journal, April 2019

  • Shikder, Rayhan; Thulasiraman, Parimala; Irani, Pourang
  • BMC Research Notes, Vol. 12, Issue 1
  • DOI: 10.1186/s13104-019-4256-6

SpaRC: scalable sequence clustering using Apache Spark
journal, August 2018


160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)
journal, January 2007


SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
journal, May 2012

  • Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry
  • Journal of Computational Biology, Vol. 19, Issue 5
  • DOI: 10.1089/cmb.2012.0021

BioPig: a Hadoop-based analytic toolkit for large-scale sequence data
journal, September 2013


BigBWA: approaching the Burrows–Wheeler aligner to Big Data technologies
journal, August 2015


ClustalW-MPI: ClustalW analysis using distributed and parallel computing
journal, August 2003


Searching for SNPs with cloud computing
journal, January 2009


Graphics processing units in bioinformatics, computational biology and systems biology
journal, July 2016

  • Nobile, Marco S.; Cazzaniga, Paolo; Tangherloni, Andrea
  • Briefings in Bioinformatics
  • DOI: 10.1093/bib/bbw058

Deep learning for biology
journal, February 2018


SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data
journal, May 2016


TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing
journal, March 2002


Bioinformatics applications on Apache Spark
journal, August 2018


Singularity: Scientific containers for mobility of compute
journal, May 2017


High-quality draft assemblies of mammalian genomes from massively parallel sequence data
journal, December 2010

  • Gnerre, S.; MacCallum, I.; Przybylski, D.
  • Proceedings of the National Academy of Sciences, Vol. 108, Issue 4
  • DOI: 10.1073/pnas.1017351108

Speeding Up Large-Scale Next Generation Sequencing Data Analysis with pBWA
journal, January 2017

  • Peters, Darren; Luo, Xuemei; Qiu, Ke
  • Journal of Applied Bioinformatics & Computational Biology, Vol. 01, Issue 01
  • DOI: 10.4172/2329-9533.1000101

Coming of age: ten years of next-generation sequencing technologies
journal, May 2016

  • Goodwin, Sara; McPherson, John D.; McCombie, W. Richard
  • Nature Reviews Genetics, Vol. 17, Issue 6
  • DOI: 10.1038/nrg.2016.49

Accelerating molecular dynamic simulation on graphics processing units
journal, April 2009

  • Friedrichs, Mark S.; Eastman, Peter; Vaidyanathan, Vishal
  • Journal of Computational Chemistry, Vol. 30, Issue 6
  • DOI: 10.1002/jcc.21209

SOAP3: ultra-fast GPU-based parallel alignment tool for short reads
journal, January 2012


The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
journal, July 2010


HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy
journal, March 2015


Big Data: Astronomical or Genomical?
journal, July 2015


Genomic big data hitting the storage bottleneck
journal, April 2018

  • Papageorgiou, Louis; Eleni, Picasi; Raftopoulou, Sofia
  • EMBnet.journal, Vol. 24
  • DOI: 10.14806/ej.24.0.910

ORCA: a comprehensive bioinformatics container environment for education and research
journal, April 2019


Shifter: Containers for HPC
journal, October 2017


End-to-End Differentiable Learning of Protein Structure
journal, April 2019


A case study of tuning MapReduce for efficient Bioinformatics in the cloud
journal, January 2017


Amdahl's Law in the Multicore Era
journal, July 2008


De novo assembly of human genomes with massively parallel short read sequencing
journal, December 2009


Ray Meta: scalable de novo metagenome assembly and profiling
journal, January 2012

  • Boisvert, Sébastien; Raymond, Frédéric; Godzaridis, Élénie
  • Genome Biology, Vol. 13, Issue 12
  • DOI: 10.1186/gb-2012-13-12-r122

Enabling large-scale next-generation sequence assembly with Blacklight: LARGE-SCALE SEQUENCE ASSEMBLY WITH BLACKLIGHT
journal, March 2014

  • Brian Couger, M.; Pipes, Lenore; Squina, Fabio
  • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 13
  • DOI: 10.1002/cpe.3231

SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner
journal, May 2013