skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: BioPig: Developing Cloud Computing Applications for Next-Generation Sequence Analysis

Abstract

Next Generation sequencing is producing ever larger data sizes with a growth rate outpacing Moore's Law. The data deluge has made many of the current sequenceanalysis tools obsolete because they do not scale with data. Here we present BioPig, a collection of cloud computing tools to scale data analysis and management. Pig is aflexible data scripting language that uses Apache's Hadoop data structure and map reduce framework to process very large data files in parallel and combine the results.BioPig extends Pig with capability with sequence analysis. We will show the performance of BioPig on a variety of bioinformatics tasks, including screeningsequence contaminants, Illumina QA/QC, and gene discovery from metagenome data sets using the Rumen metagenome as an example.

Authors:
;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
Genomics Division
OSTI Identifier:
1050659
Report Number(s):
LBNL-4784E-Poster
TRN: US201218%%871
DOE Contract Number:  
DE-AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: 6th Annual DOE JGI User Meeting, Walnut Creek, CA, 3/22-24
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICAL METHODS AND COMPUTING; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; CLOUDS; DATA ANALYSIS; GENES; MANAGEMENT; PERFORMANCE; RUMINANTS; STOMACH; STRUCTURAL CHEMICAL ANALYSIS; BioPig, cloud computing, kmers, ngrams, sequence analysis

Citation Formats

Bhatia, Karan, and Wang, Zhong. BioPig: Developing Cloud Computing Applications for Next-Generation Sequence Analysis. United States: N. p., 2011. Web.
Bhatia, Karan, & Wang, Zhong. BioPig: Developing Cloud Computing Applications for Next-Generation Sequence Analysis. United States.
Bhatia, Karan, and Wang, Zhong. 2011. "BioPig: Developing Cloud Computing Applications for Next-Generation Sequence Analysis". United States. https://www.osti.gov/servlets/purl/1050659.
@article{osti_1050659,
title = {BioPig: Developing Cloud Computing Applications for Next-Generation Sequence Analysis},
author = {Bhatia, Karan and Wang, Zhong},
abstractNote = {Next Generation sequencing is producing ever larger data sizes with a growth rate outpacing Moore's Law. The data deluge has made many of the current sequenceanalysis tools obsolete because they do not scale with data. Here we present BioPig, a collection of cloud computing tools to scale data analysis and management. Pig is aflexible data scripting language that uses Apache's Hadoop data structure and map reduce framework to process very large data files in parallel and combine the results.BioPig extends Pig with capability with sequence analysis. We will show the performance of BioPig on a variety of bioinformatics tasks, including screeningsequence contaminants, Illumina QA/QC, and gene discovery from metagenome data sets using the Rumen metagenome as an example.},
doi = {},
url = {https://www.osti.gov/biblio/1050659}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Mar 22 00:00:00 EDT 2011},
month = {Tue Mar 22 00:00:00 EDT 2011}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: