skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: JGI Plant Genomics Gene Annotation Pipeline

Abstract

Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

Authors:
; ; ; ;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
Genomics Division
OSTI Identifier:
1241222
Report Number(s):
LBNL-7040E
DOE Contract Number:
DE-AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: American Society of Plant Biologists, Portland, Oregon, 2014
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS

Citation Formats

Shu, Shengqiang, Rokhsar, Dan, Goodstein, David, Hayes, David, and Mitros, Therese. JGI Plant Genomics Gene Annotation Pipeline. United States: N. p., 2014. Web.
Shu, Shengqiang, Rokhsar, Dan, Goodstein, David, Hayes, David, & Mitros, Therese. JGI Plant Genomics Gene Annotation Pipeline. United States.
Shu, Shengqiang, Rokhsar, Dan, Goodstein, David, Hayes, David, and Mitros, Therese. Mon . "JGI Plant Genomics Gene Annotation Pipeline". United States. doi:. https://www.osti.gov/servlets/purl/1241222.
@article{osti_1241222,
title = {JGI Plant Genomics Gene Annotation Pipeline},
author = {Shu, Shengqiang and Rokhsar, Dan and Goodstein, David and Hayes, David and Mitros, Therese},
abstractNote = {Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jul 14 00:00:00 EDT 2014},
month = {Mon Jul 14 00:00:00 EDT 2014}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • After publication of this study [1], we noticed that the Strain ID summary was not removed from the PDF due to a copyediting error. The original version of this article was corrected. The publisher apologizes for any inconvenience caused.
  • The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
  • The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
  • © 2016 Huntemann et al. The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.