DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Genome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass

Abstract

Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from lowdepth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing datamore » were ,45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data.« less

Authors:
 [1];  [2];  [2];  [3];  [4];  [5];  [1]
  1. Univ. of Wisconsin, Madison, WI (United States). Dept. of Agronomy
  2. Univ. of Wisconsin, Madison, WI (United States). Inst. for Genomic Diversity
  3. Cornell Univ., Ithaca, NY (United States). Dept. of Plant Breeding and Genetics
  4. Cornell Univ., Ithaca, NY (United States). School of Integrative Plant Science. Soil and Crops Section
  5. US Dept. of Agriculture (USDA), Ithaca, NY (United States). Agricultural Research Service (ARS); US Dept. of Agriculture (USDA), Madison, WI (United States). Agricultural Research Service (ARS)
Publication Date:
Research Org.:
US Dept. of Agriculture (USDA), Washington, DC (United States). Agricultural Research Service (ARS)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1627953
Grant/Contract Number:  
AI02-07ER64454
Resource Type:
Accepted Manuscript
Journal Name:
G3
Additional Journal Information:
Journal Volume: 5; Journal Issue: 5; Journal ID: ISSN 2160-1836
Publisher:
Genetics Society of America
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Genetics & Heredity; genome-wide association study; multiple imputation; genotyping by sequencing; bioenergy; Phalaris spp

Citation Formats

Ramstein, Guillaume P., Lipka, Alexander E., Lu, Fei, Costich, Denise E., Cherney, Jerome H., Buckler, Edward S., and Casler, Michael D. Genome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass. United States: N. p., 2015. Web. doi:10.1534/g3.115.017533.
Ramstein, Guillaume P., Lipka, Alexander E., Lu, Fei, Costich, Denise E., Cherney, Jerome H., Buckler, Edward S., & Casler, Michael D. Genome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass. United States. https://doi.org/10.1534/g3.115.017533
Ramstein, Guillaume P., Lipka, Alexander E., Lu, Fei, Costich, Denise E., Cherney, Jerome H., Buckler, Edward S., and Casler, Michael D. Thu . "Genome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass". United States. https://doi.org/10.1534/g3.115.017533. https://www.osti.gov/servlets/purl/1627953.
@article{osti_1627953,
title = {Genome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass},
author = {Ramstein, Guillaume P. and Lipka, Alexander E. and Lu, Fei and Costich, Denise E. and Cherney, Jerome H. and Buckler, Edward S. and Casler, Michael D.},
abstractNote = {Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from lowdepth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were ,45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data.},
doi = {10.1534/g3.115.017533},
journal = {G3},
number = 5,
volume = 5,
place = {United States},
year = {Thu Mar 12 00:00:00 EDT 2015},
month = {Thu Mar 12 00:00:00 EDT 2015}
}

Works referenced in this record:

Lignin Biosynthesis
journal, June 2003


Imputation methods to improve inference in SNP association studies
journal, January 2006

  • Dai, James Y.; Ruczinski, Ingo; LeBlanc, Michael
  • Genetic Epidemiology, Vol. 30, Issue 8
  • DOI: 10.1002/gepi.20180

Early Trials and Use of Reed Canary Grass as a Forage Plant 1
journal, January 1931


Mixed linear model approach adapted for genome-wide association studies
journal, March 2010

  • Zhang, Zhiwu; Ersoz, Elhan; Lai, Chao-Qiang
  • Nature Genetics, Vol. 42, Issue 4
  • DOI: 10.1038/ng.546

Variance component model to account for sample structure in genome-wide association studies
journal, March 2010

  • Kang, Hyun Min; Sul, Jae Hoon; Service, Susan K.
  • Nature Genetics, Vol. 42, Issue 4
  • DOI: 10.1038/ng.548

Genetic Modification of Herbaceous Plants for Feed and Fuel
journal, January 2001


Genotype and SNP calling from next-generation sequencing data
journal, May 2011

  • Nielsen, Rasmus; Paul, Joshua S.; Albrechtsen, Anders
  • Nature Reviews Genetics, Vol. 12, Issue 6
  • DOI: 10.1038/nrg2986

Phylogeny of the tribe Aveneae (Pooideae, Poaceae) inferred from plastid trnT-F and nuclear ITS sequences
journal, September 2007

  • Quintanar, A.; Castroviejo, S.; Catalan, P.
  • American Journal of Botany, Vol. 94, Issue 9
  • DOI: 10.3732/ajb.94.9.1554

Large multi-gene phylogenetic trees of the grasses (Poaceae): Progress towards complete tribal and generic level sampling
journal, May 2008

  • Bouchenak-Khelladi, Yanis; Salamin, Nicolas; Savolainen, Vincent
  • Molecular Phylogenetics and Evolution, Vol. 47, Issue 2
  • DOI: 10.1016/j.ympev.2008.01.035

The interacting effects of temperature and plant community type on nutrient removal in wetland microcosms
journal, June 2005


Switchgrass as a sustainable bioenergy crop
journal, April 1996


Biomass Yield of Naturalized Populations and Cultivars of Reed Canary Grass
journal, August 2009

  • Casler, Michael D.; Cherney, Jerome H.; Brummer, E. Charles
  • BioEnergy Research, Vol. 2, Issue 3
  • DOI: 10.1007/s12155-009-9043-0

Switchgrass Genomic Diversity, Ploidy, and Evolution: Novel Insights from a Network-Based SNP Discovery Protocol
journal, January 2013


Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
journal, September 1997

  • Altschul, Stephen F.; Madden, Thomas L.; Schäffer, Alejandro A.
  • Nucleic Acids Research, Vol. 25, Issue 17, p. 3389-3402
  • DOI: 10.1093/nar/25.17.3389

Fully conditional specification in multivariate imputation
journal, December 2006

  • Van Buuren, S.; Brand, J. P. L.; Groothuis-Oudshoorn, C. G. M.
  • Journal of Statistical Computation and Simulation, Vol. 76, Issue 12
  • DOI: 10.1080/10629360600810434

Recursive partitioning for missing data imputation in the presence of interaction effects
journal, April 2014


Use of Multiple Imputation in the Epidemiologic Literature
journal, June 2008

  • Klebanoff, M. A.; Cole, S. R.
  • American Journal of Epidemiology, Vol. 168, Issue 4
  • DOI: 10.1093/aje/kwn071

Principal components analysis corrects for stratification in genome-wide association studies
journal, July 2006

  • Price, Alkes L.; Patterson, Nick J.; Plenge, Robert M.
  • Nature Genetics, Vol. 38, Issue 8
  • DOI: 10.1038/ng1847

Evaluation of Acid-Insoluble Ash as a Natural Marker in Ruminant Digestibility Studies
journal, February 1977


Multiple imputation of discrete and continuous data by fully conditional specification
journal, June 2007


Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP
journal, January 2011


Multiple Imputation for Missing Data via Sequential Regression Trees
journal, September 2010

  • Burgette, L. F.; Reiter, J. P.
  • American Journal of Epidemiology, Vol. 172, Issue 9
  • DOI: 10.1093/aje/kwq260

A comparison of approaches to account for uncertainty in analysis of imputed genotypes
journal, January 2011

  • Zheng, Jin; Li, Yun; Abecasis, Gonçalo R.
  • Genetic Epidemiology, Vol. 35, Issue 2
  • DOI: 10.1002/gepi.20552

DNA Polymorphisms Reveal Geographic Races of Reed Canarygrass
journal, November 2009


Large-Sample Significance Levels from Multiply Imputed Data Using Moment-Based Statistics and an F Reference Distribution
journal, December 1991

  • Li, K. H.; Raghunathan, T. E.; Rubin, D. B.
  • Journal of the American Statistical Association, Vol. 86, Issue 416
  • DOI: 10.2307/2290525

Flexible Imputation of Missing Data
book, March 2012


Divergent Selection for Secondary Traits in Upland Tetraploid Switchgrass and Effects on Sward Biomass Yield
journal, September 2013


Genetic evidence suggests a widespread distribution of native North American populations of reed canarygrass
journal, August 2012

  • Jakubowski, Andrew R.; Casler, Michael D.; Jackson, Randall D.
  • Biological Invasions, Vol. 15, Issue 2
  • DOI: 10.1007/s10530-012-0300-3

Genetic diversity and population structure of Eurasian populations of reed canarygrass: cytotypes, cultivars, and interspecific hybrids
journal, January 2011

  • Jakubowski, Andrew R.; Jackson, Randall D.; Johnson, R. C.
  • Crop and Pasture Science, Vol. 62, Issue 11
  • DOI: 10.1071/CP11232

Quantifying Actual and Theoretical Ethanol Yields for Switchgrass Strains Using NIRS Analyses
journal, August 2010


Pyrolysis of energy crops including alfalfa stems, reed canarygrass, and eastern gamagrass☆
journal, December 2006


Genetic Variability for Biofuel Traits in a Circumglobal Reed Canarygrass Collection
journal, March 2013


Different plant parts as raw material for fuel and pulp production
journal, March 2000


Genomic Selection in Wheat Breeding using Genotyping-by-Sequencing
journal, January 2012


Multiple Imputation for Interval Estimation From Simple Random Samples With Ignorable Nonresponse
journal, June 1986

  • Rubin, Donald B.; Schenker, Nathaniel
  • Journal of the American Statistical Association, Vol. 81, Issue 394
  • DOI: 10.2307/2289225

Tetraploid and hexaploid chromosome races of Phalaris arundinacea L.
journal, January 1962

  • McWilliam, Jr; Neal-Smith, Ca
  • Australian Journal of Agricultural Research, Vol. 13, Issue 1
  • DOI: 10.1071/AR9620001

Gramene: a bird's eye view of cereal genomes
journal, January 2006


Thin plate regression splines
journal, February 2003

  • Wood, Simon N.
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 65, Issue 1
  • DOI: 10.1111/1467-9868.00374

Chemical composition and response to dilute-acid pretreatment and enzymatic saccharification of alfalfa, reed canarygrass, and switchgrass
journal, October 2006


Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy
journal, March 2013

  • Rutkoski, Jessica E.; Poland, Jesse; Jannink, Jean-Luc
  • G3: Genes|Genomes|Genetics, Vol. 3, Issue 3
  • DOI: 10.1534/g3.112.005363

Status and Prospects of Association Mapping in Plants
journal, January 2008


A new multipoint method for genome-wide association studies by imputation of genotypes
journal, June 2007

  • Marchini, Jonathan; Howie, Bryan; Myers, Simon
  • Nature Genetics, Vol. 39, Issue 7
  • DOI: 10.1038/ng2088

Genotyping-by-Sequencing for Plant Breeding and Genetics
journal, January 2012


Extremely low-coverage sequencing and imputation increases power for genome-wide association studies
journal, May 2012

  • Pasaniuc, Bogdan; Rohland, Nadin; McLaren, Paul J.
  • Nature Genetics, Vol. 44, Issue 6
  • DOI: 10.1038/ng.2283

Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits
journal, July 2007


A unified mixed-model method for association mapping that accounts for multiple levels of relatedness
journal, December 2005

  • Yu, Jianming; Pressoir, Gael; Briggs, William H.
  • Nature Genetics, Vol. 38, Issue 2
  • DOI: 10.1038/ng1702

Miscellanea. Small-sample degrees of freedom with multiple imputation
journal, December 1999


Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls
journal, June 2009

  • Sterne, J. A. C.; White, I. R.; Carlin, J. B.
  • BMJ, Vol. 338, Issue jun29 1
  • DOI: 10.1136/bmj.b2393

A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species
journal, May 2011


Multiple Imputation After 18+ Years
journal, June 1996

  • Rubin, Donald B.
  • Journal of the American Statistical Association, Vol. 91, Issue 434
  • DOI: 10.2307/2291635

Chemical composition of herbaceous grass and legume species grown for maximum biomass production
journal, January 1988


Efficient Control of Population Structure in Model Organism Association Mapping
journal, March 2008


Revision of the genus Phalaris L. (Gramineae)
journal, January 1995


Statistical significance for genomewide studies
journal, July 2003

  • Storey, J. D.; Tibshirani, R.
  • Proceedings of the National Academy of Sciences, Vol. 100, Issue 16, p. 9440-9445
  • DOI: 10.1073/pnas.1530509100

Multiple Imputation of Missing Phenotype Data for QTL Mapping
journal, January 2011

  • Bobb, Jennifer F.; Scharfstein, Daniel O.; Daniels, Michael J.
  • Statistical Applications in Genetics and Molecular Biology, Vol. 10, Issue 1
  • DOI: 10.2202/1544-6115.1676

Practical Issues in Imputation-Based Association Mapping
journal, December 2008


Landfill Leachate Recirculation: Effects on Vegetation Vigor and Clay Surface Cover Infiltration
journal, January 1991


Reed Canarygrass and Other Phalaris Species
book, October 2015


Yield Components of Biomass in Switchgrass
journal, January 2008


A Two-Stage Technique for the in Vitro Digestion of Forage Crops
journal, June 1963


Multiple Imputation after 18+ Years
journal, June 1996


Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse
journal, June 1986


Genetic Modification of Herbaceous Plants for Feed and Fuel
journal, January 2001

  • Vogel, Kenneth P.; Jung, Hans-Joachim G.
  • Critical Reviews in Plant Sciences, Vol. 20, Issue 1
  • DOI: 10.1080/20013591099173

Imputation methods to improve inference in SNP association studies
journal, January 2006

  • Dai, James Y.; Ruczinski, Ingo; LeBlanc, Michael
  • Genetic Epidemiology, Vol. 30, Issue 8
  • DOI: 10.1002/gepi.20180

A comparison of approaches to account for uncertainty in analysis of imputed genotypes
journal, January 2011

  • Zheng, Jin; Li, Yun; Abecasis, Gonçalo R.
  • Genetic Epidemiology, Vol. 35, Issue 2
  • DOI: 10.1002/gepi.20552

Chemical composition of herbaceous grass and legume species grown for maximum biomass production
journal, January 1988


The interacting effects of temperature and plant community type on nutrient removal in wetland microcosms
journal, June 2005


Large multi-gene phylogenetic trees of the grasses (Poaceae): Progress towards complete tribal and generic level sampling
journal, May 2008

  • Bouchenak-Khelladi, Yanis; Salamin, Nicolas; Savolainen, Vincent
  • Molecular Phylogenetics and Evolution, Vol. 47, Issue 2
  • DOI: 10.1016/j.ympev.2008.01.035

Variance component model to account for sample structure in genome-wide association studies
journal, March 2010

  • Kang, Hyun Min; Sul, Jae Hoon; Service, Susan K.
  • Nature Genetics, Vol. 42, Issue 4
  • DOI: 10.1038/ng.548

A unified mixed-model method for association mapping that accounts for multiple levels of relatedness
journal, December 2005

  • Yu, Jianming; Pressoir, Gael; Briggs, William H.
  • Nature Genetics, Vol. 38, Issue 2
  • DOI: 10.1038/ng1702

Principal components analysis corrects for stratification in genome-wide association studies
journal, July 2006

  • Price, Alkes L.; Patterson, Nick J.; Plenge, Robert M.
  • Nature Genetics, Vol. 38, Issue 8
  • DOI: 10.1038/ng1847

Genotype and SNP calling from next-generation sequencing data
journal, May 2011

  • Nielsen, Rasmus; Paul, Joshua S.; Albrechtsen, Anders
  • Nature Reviews Genetics, Vol. 12, Issue 6
  • DOI: 10.1038/nrg2986

Statistical significance for genomewide studies
journal, July 2003

  • Storey, J. D.; Tibshirani, R.
  • Proceedings of the National Academy of Sciences, Vol. 100, Issue 16, p. 9440-9445
  • DOI: 10.1073/pnas.1530509100

Use of Multiple Imputation in the Epidemiologic Literature
journal, June 2008

  • Klebanoff, M. A.; Cole, S. R.
  • American Journal of Epidemiology, Vol. 168, Issue 4
  • DOI: 10.1093/aje/kwn071

Multiple Imputation for Missing Data via Sequential Regression Trees
journal, September 2010

  • Burgette, L. F.; Reiter, J. P.
  • American Journal of Epidemiology, Vol. 172, Issue 9
  • DOI: 10.1093/aje/kwq260

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
journal, September 1997

  • Altschul, Stephen F.; Madden, Thomas L.; Schäffer, Alejandro A.
  • Nucleic Acids Research, Vol. 25, Issue 17, p. 3389-3402
  • DOI: 10.1093/nar/25.17.3389

Gramene: a bird's eye view of cereal genomes
journal, January 2006


Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls
journal, June 2009

  • Sterne, J. A. C.; White, I. R.; Carlin, J. B.
  • BMJ, Vol. 338, Issue jun29 1
  • DOI: 10.1136/bmj.b2393

Lignin Biosynthesis
journal, June 2003


Multiple imputation of discrete and continuous data by fully conditional specification
journal, June 2007


Imputation-based analysis of association studies: candidate regions and quantitative traits
journal, January 2005


Switchgrass Genomic Diversity, Ploidy, and Evolution: Novel Insights from a Network-Based SNP Discovery Protocol
journal, January 2013


Landfill Leachate Recirculation: Effects on Vegetation Vigor and Clay Surface Cover Infiltration
journal, January 1991


Phylogeny of the tribe Aveneae (Pooideae, Poaceae) inferred from plastid trnT-F and nuclear ITS sequences
journal, September 2007

  • Quintanar, A.; Castroviejo, S.; Catalan, P.
  • American Journal of Botany, Vol. 94, Issue 9
  • DOI: 10.3732/ajb.94.9.1554

Works referencing / citing this record:

A reassessment of the genome size–invasiveness relationship in reed canarygrass (Phalaris arundinacea)
journal, March 2018

  • Martinez, Megan A.; Baack, Eric J.; Hovick, Stephen M.
  • Annals of Botany, Vol. 121, Issue 7
  • DOI: 10.1093/aob/mcy028

Genotyping-by-sequencing provides the discriminating power to investigate the subspecies of Daucus carota (Apiaceae)
journal, October 2016

  • Arbizu, Carlos I.; Ellison, Shelby L.; Senalik, Douglas
  • BMC Evolutionary Biology, Vol. 16, Issue 1
  • DOI: 10.1186/s12862-016-0806-x

Genotyping-by-sequencing provides the discriminating power to investigate the subspecies of Daucus carota (Apiaceae)
journal, October 2016

  • Arbizu, Carlos I.; Ellison, Shelby L.; Senalik, Douglas
  • BMC Evolutionary Biology, Vol. 16, Issue 1
  • DOI: 10.1186/s12862-016-0806-x

Variation in sequences containing microsatellite motifs in the perennial biomass and forage grass, Phalaris arundinacea (Poaceae)
journal, March 2016

  • Barth, Susanne; Jankowska, Marta Jolanta; Hodkinson, Trevor Roland
  • BMC Research Notes, Vol. 9, Issue 1
  • DOI: 10.1186/s13104-016-1994-6

Genome-wide association mapping in winter barley for grain yield and culm cell wall polymer content using the high-throughput CoMPP technique
journal, March 2017


Association Mapping in Scandinavian Winter Wheat for Yield, Plant Height, and Traits Important for Second-Generation Bioethanol Production
journal, November 2015

  • Bellucci, Andrea; Torp, Anna Maria; Bruun, Sander
  • Frontiers in Plant Science, Vol. 6
  • DOI: 10.3389/fpls.2015.01046