skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Evolutionary characteristics of intergenic transcribed regions indicate rare novel genes and widespread noisy transcription in the Poaceae

Abstract

Extensive transcriptional activity occurring in intergenic regions of genomes has raised the question whether intergenic transcription represents the activity of novel genes or noisy expression. To address this, we evaluated cross-species and post-duplication sequence and expression conservation of intergenic transcribed regions (ITRs) in four Poaceae species. Among 43,301 ITRs across the four species, 34,460 (80%) are species-specific. ITRs found across species tend to be more divergent in expression and have more recent duplicates compared to annotated genes. To assess if ITRs are functional (under selection), machine learning models were established in Oryza sativa (rice) that could accurately distinguish between phenotype genes and pseudogenes (area under curve-receiver operating characteristic = 0.94). Based on the models, 584 (8%) and 4391 (61%) rice ITRs are classified as likely functional and nonfunctional with high confidence, respectively. ITRs with conserved expression and ancient retained duplicates, features that were not part of the model, are frequently classified as likely-functional, suggesting these characteristics could serve as pragmatic rules of thumb for identifying candidate sequences likely to be under selection. This study also provides a framework to identify novel genes using comparative transcriptomic data to improve genome annotation that is fundamental for connecting genotype to phenotype in cropmore » and model systems.« less

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [2];  [3];  [4];  [2]; ORCiD logo [2]
  1. Michigan State Univ., East Lansing, MI (United States); Univ. of Michigan, Ann Arbor, MI (United States)
  2. Michigan State Univ., East Lansing, MI (United States)
  3. Pennsylvania State Univ., University Park, PA (United States)
  4. Cornell Univ., Ithaca, NY (United States)
Publication Date:
Research Org.:
Michigan State Univ., East Lansing, MI (United States). Great Lakes Bioenergy Research Center
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
OSTI Identifier:
1579362
Grant/Contract Number:  
SC0018409; IOS-1546617; DEB-1655386
Resource Type:
Accepted Manuscript
Journal Name:
Scientific Reports
Additional Journal Information:
Journal Volume: 9; Journal Issue: 1; Journal ID: ISSN 2045-2322
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Lloyd, John P., Bowman, Megan J., Azodi, Christina B., Sowers, Rosalie P., Moghe, Gaurav D., Childs, Kevin L., and Shiu, Shin -Han. Evolutionary characteristics of intergenic transcribed regions indicate rare novel genes and widespread noisy transcription in the Poaceae. United States: N. p., 2019. Web. doi:10.1038/s41598-019-47797-y.
Lloyd, John P., Bowman, Megan J., Azodi, Christina B., Sowers, Rosalie P., Moghe, Gaurav D., Childs, Kevin L., & Shiu, Shin -Han. Evolutionary characteristics of intergenic transcribed regions indicate rare novel genes and widespread noisy transcription in the Poaceae. United States. doi:10.1038/s41598-019-47797-y.
Lloyd, John P., Bowman, Megan J., Azodi, Christina B., Sowers, Rosalie P., Moghe, Gaurav D., Childs, Kevin L., and Shiu, Shin -Han. Tue . "Evolutionary characteristics of intergenic transcribed regions indicate rare novel genes and widespread noisy transcription in the Poaceae". United States. doi:10.1038/s41598-019-47797-y. https://www.osti.gov/servlets/purl/1579362.
@article{osti_1579362,
title = {Evolutionary characteristics of intergenic transcribed regions indicate rare novel genes and widespread noisy transcription in the Poaceae},
author = {Lloyd, John P. and Bowman, Megan J. and Azodi, Christina B. and Sowers, Rosalie P. and Moghe, Gaurav D. and Childs, Kevin L. and Shiu, Shin -Han},
abstractNote = {Extensive transcriptional activity occurring in intergenic regions of genomes has raised the question whether intergenic transcription represents the activity of novel genes or noisy expression. To address this, we evaluated cross-species and post-duplication sequence and expression conservation of intergenic transcribed regions (ITRs) in four Poaceae species. Among 43,301 ITRs across the four species, 34,460 (80%) are species-specific. ITRs found across species tend to be more divergent in expression and have more recent duplicates compared to annotated genes. To assess if ITRs are functional (under selection), machine learning models were established in Oryza sativa (rice) that could accurately distinguish between phenotype genes and pseudogenes (area under curve-receiver operating characteristic = 0.94). Based on the models, 584 (8%) and 4391 (61%) rice ITRs are classified as likely functional and nonfunctional with high confidence, respectively. ITRs with conserved expression and ancient retained duplicates, features that were not part of the model, are frequently classified as likely-functional, suggesting these characteristics could serve as pragmatic rules of thumb for identifying candidate sequences likely to be under selection. This study also provides a framework to identify novel genes using comparative transcriptomic data to improve genome annotation that is fundamental for connecting genotype to phenotype in crop and model systems.},
doi = {10.1038/s41598-019-47797-y},
journal = {Scientific Reports},
number = 1,
volume = 9,
place = {United States},
year = {2019},
month = {8}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing
journal, June 2008


Araport: the Arabidopsis Information Portal
journal, November 2014

  • Krishnakumar, Vivek; Hanlon, Matthew R.; Contrino, Sergio
  • Nucleic Acids Research, Vol. 43, Issue D1
  • DOI: 10.1093/nar/gku1200

Conservation and Functional Element Discovery in 20 Angiosperm Plant Genomes
journal, May 2013


New technologies accelerate the exploration of non-coding RNAs in horticultural plants
journal, July 2017


Seventy Million Years of Concerted Evolution of a Homoeologous Chromosome Pair, in Parallel, in Major Poaceae Lineages
journal, January 2011


Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
journal, August 2005


Aligning Multiple Genomic Sequences With the Threaded Blockset Aligner
journal, April 2004


De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis
journal, July 2013

  • Haas, Brian J.; Papanicolaou, Alexie; Yassour, Moran
  • Nature Protocols, Vol. 8, Issue 8
  • DOI: 10.1038/nprot.2013.084

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
journal, May 2010

  • Trapnell, Cole; Williams, Brian A.; Pertea, Geo
  • Nature Biotechnology, Vol. 28, Issue 5
  • DOI: 10.1038/nbt.1621

Biological function in the twilight zone of sequence conservation
journal, August 2017


Determinants of nucleosome positioning and their influence on plant gene expression
journal, June 2015

  • Liu, Ming-Jung; Seddon, Alexander E.; Tsai, Zing Tsung-Yeh
  • Genome Research, Vol. 25, Issue 8
  • DOI: 10.1101/gr.188680.114

The time-resolved transcriptome of C. elegans
journal, August 2016

  • Boeck, Max E.; Huynh, Chau; Gevirtzman, Lou
  • Genome Research, Vol. 26, Issue 10
  • DOI: 10.1101/gr.202663.115

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions
journal, January 2013


MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes
journal, November 2007

  • Cantarel, B. L.; Korf, I.; Robb, S. M. C.
  • Genome Research, Vol. 18, Issue 1
  • DOI: 10.1101/gr.6743907

Characteristics and Significance of Intergenic Polyadenylated RNA Transcription in Arabidopsis
journal, November 2012

  • Moghe, Gaurav D.; Lehti-Shiu, Melissa D.; Seddon, Alex E.
  • Plant Physiology, Vol. 161, Issue 1
  • DOI: 10.1104/pp.112.205245

Transcriptional noise and the fidelity of initiation by RNA polymerase II
journal, February 2007


MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations
journal, December 2013

  • Campbell, Michael S.; Law, MeiYee; Holt, Carson
  • Plant Physiology, Vol. 164, Issue 2
  • DOI: 10.1104/pp.113.230144

Diversity and dynamics of the Drosophila transcriptome
journal, March 2014

  • Brown, James B.; Boley, Nathan; Eisman, Robert
  • Nature, Vol. 512, Issue 7515
  • DOI: 10.1038/nature12962

Phytozome: a comparative platform for green plant genomics
journal, November 2011

  • Goodstein, David M.; Shu, Shengqiang; Howson, Russell
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr944

Global Identification of Human Transcribed Sequences with Genome Tiling Arrays
journal, December 2004


Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function
journal, January 2006


On the Immortality of Television Sets: "Function" in the Human Genome According to the Evolution-Free Gospel of ENCODE
journal, January 2013

  • Graur, D.; Zheng, Y.; Price, N.
  • Genome Biology and Evolution, Vol. 5, Issue 3
  • DOI: 10.1093/gbe/evt028

Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes
journal, August 2011


Angiosperm genome comparisons reveal early polyploidy in the monocot lineage
journal, December 2009

  • Tang, H.; Bowers, J. E.; Wang, X.
  • Proceedings of the National Academy of Sciences, Vol. 107, Issue 1
  • DOI: 10.1073/pnas.0908007107

Genome-Wide Nucleosome Positioning Is Orchestrated by Genomic Regions Associated with DNase I Hypersensitivity in Rice
journal, May 2014


An ontology approach to comparative phenomics in plants
journal, January 2015


An expression atlas of rice mRNAs and small RNAs
journal, March 2007

  • Nobuta, Kan; Venu, R. C.; Lu, Cheng
  • Nature Biotechnology, Vol. 25, Issue 4
  • DOI: 10.1038/nbt1291

Gene Space Dynamics During the Evolution of Aegilops tauschii, Brachypodium distachyon, Oryza sativa, and Sorghum bicolor Genomes
journal, April 2011

  • Massa, A. N.; Wanjugi, H.; Deal, K. R.
  • Molecular Biology and Evolution, Vol. 28, Issue 9
  • DOI: 10.1093/molbev/msr080

Rfam 12.0: updates to the RNA families database
journal, November 2014

  • Nawrocki, Eric P.; Burge, Sarah W.; Bateman, Alex
  • Nucleic Acids Research, Vol. 43, Issue D1
  • DOI: 10.1093/nar/gku1063

Evolutionary and Expression Signatures of Pseudogenes in Arabidopsis and Rice
journal, July 2009

  • Zou, Cheng; Lehti-Shiu, Melissa D.; Thibaud-Nissen, Françoise
  • Plant Physiology, Vol. 151, Issue 1
  • DOI: 10.1104/pp.109.140632

The Pfam protein families database: towards a more sustainable future
journal, December 2015

  • Finn, Robert D.; Coggill, Penelope; Eberhardt, Ruth Y.
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1344

The GENCODE pseudogene resource
journal, January 2012


Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics
journal, May 2004

  • Paterson, A. H.; Bowers, J. E.; Chapman, B. A.
  • Proceedings of the National Academy of Sciences, Vol. 101, Issue 26
  • DOI: 10.1073/pnas.0307901101

Empirical Analysis of Transcriptional Activity in the Arabidopsis Genome
journal, October 2003


Genome Annotation and Curation Using MAKER and MAKER‐P
journal, December 2014


Cis-acting noncoding RNAs: friends and foes
journal, November 2012

  • Guil, Sònia; Esteller, Manel
  • Nature Structural & Molecular Biology, Vol. 19, Issue 11
  • DOI: 10.1038/nsmb.2428

Automated Update, Revision, and Quality Control of the Maize Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and Identifies New Genes
journal, November 2014

  • Law, MeiYee; Childs, Kevin L.; Campbell, Michael S.
  • Plant Physiology, Vol. 167, Issue 1
  • DOI: 10.1104/pp.114.245027

Characteristics of Plant Essential Genes Allow for within- and between-Species Prediction of Lethal Mutant Phenotypes
journal, August 2015

  • Lloyd, John P.; Seddon, Alexander E.; Moghe, Gaurav D.
  • The Plant Cell, Vol. 27, Issue 8
  • DOI: 10.1105/tpc.15.00051

PHAST and RPHAST: phylogenetic analysis with space/time models
journal, December 2010

  • Hubisz, M. J.; Pollard, K. S.; Siepel, A.
  • Briefings in Bioinformatics, Vol. 12, Issue 1
  • DOI: 10.1093/bib/bbq072

Defining Functional Genic Regions in the Human Genome through Integration of Biochemical, Evolutionary, and Genetic Evidence
journal, April 2017

  • Tsai, Zing Tsung-Yeh; Lloyd, John P.; Shiu, Shin-Han
  • Molecular Biology and Evolution, Vol. 34, Issue 7
  • DOI: 10.1093/molbev/msx101

A Model-Based Approach for Identifying Functional Intergenic Transcribed Regions and Noncoding RNAs
journal, March 2018

  • Lloyd, John P.; Tsai, Zing Tsung-Yeh; Sowers, Rosalie P.
  • Molecular Biology and Evolution, Vol. 35, Issue 6
  • DOI: 10.1093/molbev/msy035

Most “Dark Matter” Transcripts Are Associated With Known Genes
journal, May 2010


Regulated functional alternative splicing in Drosophila
journal, September 2011

  • Venables, J. P.; Tazi, J.; Juge, F.
  • Nucleic Acids Research, Vol. 40, Issue 1
  • DOI: 10.1093/nar/gkr648

The uniqueome: a mappability resource for short-tag sequencing
journal, November 2010


Regularization Paths for Generalized Linear Models via Coordinate Descent
journal, January 2010

  • Friedman, Jerome; Hastie, Trevor; Tibshirani, Robert
  • Journal of Statistical Software, Vol. 33, Issue 1
  • DOI: 10.18637/jss.v033.i01

Function without purpose: The uses of causal role function in evolutionary biology
journal, October 1994

  • Amundson, Ron; Lauder, George V.
  • Biology & Philosophy, Vol. 9, Issue 4
  • DOI: 10.1007/BF00850375

Distinguishing between "Function" and "Effect" in Genome Biology
journal, May 2014

  • Doolittle, W. F.; Brunet, T. D. P.; Linquist, S.
  • Genome Biology and Evolution, Vol. 6, Issue 5
  • DOI: 10.1093/gbe/evu098

Defining functional DNA elements in the human genome
journal, April 2014

  • Kellis, M.; Wold, B.; Snyder, M. P.
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 17
  • DOI: 10.1073/pnas.1318948111

Proto-genes and de novo gene birth
journal, June 2012

  • Carvunis, Anne-Ruxandra; Rolland, Thomas; Wapinski, Ilan
  • Nature, Vol. 487, Issue 7407
  • DOI: 10.1038/nature11184

Small open reading frames associated with morphogenesis are hidden in plant genomes
journal, January 2013

  • Hanada, K.; Higuchi-Takeuchi, M.; Okamoto, M.
  • Proceedings of the National Academy of Sciences, Vol. 110, Issue 6
  • DOI: 10.1073/pnas.1213958110

Infernal 1.1: 100-fold faster RNA homology searches
journal, September 2013


PAML 4: Phylogenetic Analysis by Maximum Likelihood
journal, April 2007


Close Split of Sorghum and Maize Genome Progenitors
journal, September 2004


miRBase: annotating high confidence microRNAs using deep sequencing data
journal, November 2013

  • Kozomara, Ana; Griffiths-Jones, Sam
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1181

mice : Multivariate Imputation by Chained Equations in R
journal, January 2011

  • Buuren, Stef van; Groothuis-Oudshoorn, Karin
  • Journal of Statistical Software, Vol. 45, Issue 3
  • DOI: 10.18637/jss.v045.i03

Extensive microRNA-mediated crosstalk between lncRNAs and mRNAs in mouse embryonic stem cells
journal, March 2015

  • Tan, Jennifer Y.; Sirey, Tamara; Honti, Frantisek
  • Genome Research, Vol. 25, Issue 5
  • DOI: 10.1101/gr.181974.114