Defining window-boundaries for genomic analyses using smoothing spline techniques
Abstract
High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome.
- Authors:
-
- Univ. of California, Davis, CA (United States). Dept. Plant Sciences.
- Univ. of Wisconsin, Madison, WI (United States). Dept. of Animal Sciences and Dept. of Biostatistics and Medical Information.
- Univ. of Wisconsin, Madison, WI (United States). Dept. of Agronomy and Dept. of Energy Great Lakes Bioenergy Research Center.
- Univ. of Wisconsin, Madison, WI (United States). Dept. of Animal Sciences, Dept. of Biostatistics and Medical Information and Dept. of Dairy Science.
- Publication Date:
- Research Org.:
- Univ. of Wisconsin, Madison, WI (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- OSTI Identifier:
- 1184786
- Grant/Contract Number:
- FC02-07ER64494
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Genetics Selection Evolution (Online)
- Additional Journal Information:
- Journal Name: Genetics Selection Evolution (Online); Journal Volume: 47; Journal Issue: 1; Journal ID: ISSN 1297-9686
- Publisher:
- BioMed Central
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
Citation Formats
Beissinger, Timothy M., Rosa, Guilherme J.M., Kaeppler, Shawn M., Gianola, Daniel, and de Leon, Natalia. Defining window-boundaries for genomic analyses using smoothing spline techniques. United States: N. p., 2015.
Web. doi:10.1186/s12711-015-0105-9.
Beissinger, Timothy M., Rosa, Guilherme J.M., Kaeppler, Shawn M., Gianola, Daniel, & de Leon, Natalia. Defining window-boundaries for genomic analyses using smoothing spline techniques. United States. https://doi.org/10.1186/s12711-015-0105-9
Beissinger, Timothy M., Rosa, Guilherme J.M., Kaeppler, Shawn M., Gianola, Daniel, and de Leon, Natalia. Fri .
"Defining window-boundaries for genomic analyses using smoothing spline techniques". United States. https://doi.org/10.1186/s12711-015-0105-9. https://www.osti.gov/servlets/purl/1184786.
@article{osti_1184786,
title = {Defining window-boundaries for genomic analyses using smoothing spline techniques},
author = {Beissinger, Timothy M. and Rosa, Guilherme J.M. and Kaeppler, Shawn M. and Gianola, Daniel and de Leon, Natalia},
abstractNote = {High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome.},
doi = {10.1186/s12711-015-0105-9},
journal = {Genetics Selection Evolution (Online)},
number = 1,
volume = 47,
place = {United States},
year = {Fri Apr 17 00:00:00 EDT 2015},
month = {Fri Apr 17 00:00:00 EDT 2015}
}
Web of Science
Works referenced in this record:
Detecting recent positive selection in the human genome from haplotype structure
journal, October 2002
- Sabeti, Pardis C.; Reich, David E.; Higgins, John M.
- Nature, Vol. 419, Issue 6909
A Genome-Wide Scan for Evidence of Selection in a Maize Population Under Long-Term Artificial Selection for Ear Number
journal, December 2013
- Beissinger, Timothy M.; Hirsch, Candice N.; Vaillancourt, Brieanne
- Genetics, Vol. 196, Issue 3
Use of locally weighted scatterplot smoothing (LOWESS) regression to study selection signatures in Piedmontese and Italian Brown cattle breeds
journal, July 2013
- Pintus, Elia; Sorbolini, Silvia; Albera, Andrea
- Animal Genetics, Vol. 45, Issue 1
Integration of association statistics over genomic regions using Bayesian adaptive regression splines
journal, November 2003
- Zhang, Xiaohua; Roeder, Kathryn; Wallstrom, Garrick
- Human Genomics, Vol. 1, Issue 1
Genome-wide analysis of a long-term evolution experiment with Drosophila
journal, September 2010
- Burke, Molly K.; Dunham, Joseph P.; Shahrestani, Parvin
- Nature, Vol. 467, Issue 7315
QMSim: a large-scale genome simulator for livestock
journal, January 2009
- Sargolzaei, M.; Schenkel, F. S.
- Bioinformatics, Vol. 25, Issue 5
Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
journal, July 2012
- Zhu, Yuan; Bergland, Alan O.; González, Josefa
- PLoS ONE, Vol. 7, Issue 7
Tracking footprints of artificial selection in the dog genome
journal, January 2010
- Akey, J. M.; Ruhe, A. L.; Akey, D. T.
- Proceedings of the National Academy of Sciences, Vol. 107, Issue 3
The Genomic Signal of Partial Sweeps in Mimulus guttatus
journal, July 2013
- Kelly, John K.; Koseva, Boryana; Mojica, Julius P.
- Genome Biology and Evolution, Vol. 5, Issue 8
Genome-Wide Footprints of Pig Domestication and Selection Revealed through Massive Parallel Sequencing of Pooled DNA
journal, April 2011
- Amaral, Andreia J.; Ferretti, Luca; Megens, Hendrik-Jan
- PLoS ONE, Vol. 6, Issue 4
Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry
journal, January 2013
- Hider, Jessica L.; Gittelman, Rachel M.; Shah, Tapan
- BMC Evolutionary Biology, Vol. 13, Issue 1
Whole-genome resequencing reveals loci under selection during chicken domestication
journal, March 2010
- Rubin, Carl-Johan; Zody, Michael C.; Eriksson, Jonas
- Nature, Vol. 464, Issue 7288
Spline models for observational data
journal, September 1991
- de Boor, Carl
- Journal of Approximation Theory, Vol. 66, Issue 3
Genome-Wide Effects of Long-Term Divergent Selection
journal, November 2010
- Johansson, Anna M.; Pettersson, Mats E.; Siegel, Paul B.
- PLoS Genetics, Vol. 6, Issue 11
The Genomic Signal of Partial Sweeps in Mimulus guttatus
journal, January 2013
- Kelly, J. K.; Koseva, B.; Mojica, J. P.
- Genome Biology and Evolution, Vol. 5, Issue 9
LDx: Estimation of Linkage Disequilibrium from High-Throughput Pooled Resequencing Data
journal, November 2012
- Feder, Alison F.; Petrov, Dmitri A.; Bergland, Alan O.
- PLoS ONE, Vol. 7, Issue 11
ESTIMATING F -STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE
journal, November 1984
- Weir, B. S.; Cockerham, C. Clark
- Evolution, Vol. 38, Issue 6
Population-Based Resequencing of Experimentally Evolved Populations Reveals the Genetic Basis of Body Size Variation in Drosophila melanogaster
journal, March 2011
- Turner, Thomas L.; Stewart, Andrew D.; Fields, Andrew T.
- PLoS Genetics, Vol. 7, Issue 3
Smoothing by spline functions
journal, October 1967
- Reinsch, Christian H.
- Numerische Mathematik, Vol. 10, Issue 3
The spread of a gene in natural conditions in a colony of the moth Panaxia dominula L.
journal, October 1947
- Fisher, R. A.; Ford, E. B.
- Heredity, Vol. 1, Issue 2
The hitch-hiking effect of a favourable gene
journal, February 1974
- Smith, John Maynard; Haigh, John
- Genetical Research, Vol. 23, Issue 1
Estimating F-Statistics for the Analysis of Population Structure
journal, November 1984
- Weir, B. S.; Cockerham, C. Clark
- Evolution, Vol. 38, Issue 6
Constructing genomic maps of positive selection in humans: Where do we go from here?
journal, May 2009
- Akey, J. M.
- Genome Research, Vol. 19, Issue 5
Smoothing noisy data with spline functions
journal, March 1985
- Hutchinson, M. F.; de Hoog, F. R.
- Numerische Mathematik, Vol. 47, Issue 1
A High Resolution Genome-Wide Scan for Significant Selective Sweeps: An Application to Pooled Sequence Data in Laying Chickens
journal, November 2012
- Qanbari, Saber; Strom, Tim M.; Haberer, Georg
- PLoS ONE, Vol. 7, Issue 11
Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.
journal, November 1989
- Tajima, F.
- Genetics, Vol. 123, Issue 3
LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data
text, January 2012
- Feder, Alison F.; Petrov, Dmitri A.; Bergland, Alan O.
- arXiv
Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation
journal, December 1978
- Craven, Peter; Wahba, Grace
- Numerische Mathematik, Vol. 31, Issue 4
The spread of a gene in natural conditions in a colony of the moth Panaxia dominula L.
journal, October 1947
- Fisher, R. A.; Ford, E. B.
- Heredity, Vol. 1, Issue 2
Detecting recent positive selection in the human genome from haplotype structure
journal, October 2002
- Sabeti, Pardis C.; Reich, David E.; Higgins, John M.
- Nature, Vol. 419, Issue 6909
Whole-genome resequencing reveals loci under selection during chicken domestication
journal, March 2010
- Rubin, Carl-Johan; Zody, Michael C.; Eriksson, Jonas
- Nature, Vol. 464, Issue 7288
Genome-wide analysis of a long-term evolution experiment with Drosophila
journal, September 2010
- Burke, Molly K.; Dunham, Joseph P.; Shahrestani, Parvin
- Nature, Vol. 467, Issue 7315
Tracking footprints of artificial selection in the dog genome
journal, January 2010
- Akey, J. M.; Ruhe, A. L.; Akey, D. T.
- Proceedings of the National Academy of Sciences, Vol. 107, Issue 3
QMSim: a large-scale genome simulator for livestock
journal, January 2009
- Sargolzaei, M.; Schenkel, F. S.
- Bioinformatics, Vol. 25, Issue 5
The Genomic Signal of Partial Sweeps in Mimulus guttatus
journal, January 2013
- Kelly, J. K.; Koseva, B.; Mojica, J. P.
- Genome Biology and Evolution, Vol. 5, Issue 9
Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.
journal, November 1989
- Tajima, F.
- Genetics, Vol. 123, Issue 3
Constructing genomic maps of positive selection in humans: Where do we go from here?
journal, May 2009
- Akey, J. M.
- Genome Research, Vol. 19, Issue 5
Use of locally weighted scatterplot smoothing (LOWESS) regression to study selection signatures in Piedmontese and Italian Brown cattle breeds
journal, July 2013
- Pintus, Elia; Sorbolini, Silvia; Albera, Andrea
- Animal Genetics, Vol. 45, Issue 1
Identification and Analysis of Genomic Regions with Large Between-Population Differentiation in Humans
journal, August 2007
- Myles, S.; Tang, K.; Somel, M.
- Annals of Human Genetics, Vol. 0, Issue 0
Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry
journal, January 2013
- Hider, Jessica L.; Gittelman, Rachel M.; Shah, Tapan
- BMC Evolutionary Biology, Vol. 13, Issue 1
Integration of association statistics over genomic regions using Bayesian adaptive regression splines
journal, November 2003
- Zhang, Xiaohua; Roeder, Kathryn; Wallstrom, Garrick
- Human Genomics, Vol. 1, Issue 1
Genome-Wide Effects of Long-Term Divergent Selection
journal, November 2010
- Johansson, Anna M.; Pettersson, Mats E.; Siegel, Paul B.
- PLoS Genetics, Vol. 6, Issue 11
Population-Based Resequencing of Experimentally Evolved Populations Reveals the Genetic Basis of Body Size Variation in Drosophila melanogaster
journal, March 2011
- Turner, Thomas L.; Stewart, Andrew D.; Fields, Andrew T.
- PLoS Genetics, Vol. 7, Issue 3
Genome-Wide Footprints of Pig Domestication and Selection Revealed through Massive Parallel Sequencing of Pooled DNA
journal, April 2011
- Amaral, Andreia J.; Ferretti, Luca; Megens, Hendrik-Jan
- PLoS ONE, Vol. 6, Issue 4
Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
journal, July 2012
- Zhu, Yuan; Bergland, Alan O.; González, Josefa
- PLoS ONE, Vol. 7, Issue 7
LDx: Estimation of Linkage Disequilibrium from High-Throughput Pooled Resequencing Data
journal, November 2012
- Feder, Alison F.; Petrov, Dmitri A.; Bergland, Alan O.
- PLoS ONE, Vol. 7, Issue 11
A Genome-Wide Scan for Evidence of Selection in a Maize Population Under Long-Term Artificial Selection for Ear Number
journal, December 2013
- Beissinger, Timothy M.; Hirsch, Candice N.; Vaillancourt, Brieanne
- Genetics, Vol. 196, Issue 3
Estimating F-Statistics for the Analysis of Population Structure
journal, November 1984
- Weir, B. S.; Cockerham, C. Clark
- Evolution, Vol. 38, Issue 6
Works referencing / citing this record:
Exploring Evolutionary Relationships Across the Genome Using Topology Weighting.
text, January 2017
- Martin, Simon; Van Belleghem, Steven
- Apollo - University of Cambridge Repository
Fixed-length haplotypes can improve genomic prediction accuracy in an admixed dairy cattle population
journal, July 2017
- Hess, Melanie; Druet, Tom; Hess, Andrew
- Genetics Selection Evolution, Vol. 49, Issue 1
Reaffirmation of known major genes and the identification of novel candidate genes associated with carcass-related metrics based on whole genome sequence within a large multi-breed cattle population
journal, September 2019
- Purfield, D. C.; Evans, R. D.; Berry, D. P.
- BMC Genomics, Vol. 20, Issue 1
A Nested Mixture Model for Genomic Prediction Using Whole-Genome SNP Genotypes
report, January 2016
- Zeng, Jian; Garrick, Dorian J.; Dekkers, Jack C.
- Iowa State University
Drosophila simulans : A Species with Improved Resolution in Evolve and Resequence Studies
journal, May 2017
- Barghi, Neda; Tobler, Raymond; Nolte, Viola
- G3: Genes|Genomes|Genetics, Vol. 7, Issue 7
Variance components for bovine tuberculosis infection and multi-breed genome-wide association analysis using imputed whole genome sequence data
journal, February 2019
- Ring, S. C.; Purfield, D. C.; Good, M.
- PLOS ONE, Vol. 14, Issue 2
Cassava haplotype map highlights fixation of deleterious mutations during clonal propagation
journal, April 2017
- Ramu, Punna; Esuma, Williams; Kawuki, Robert
- Nature Genetics, Vol. 49, Issue 6
Impact of polymorphic transposable elements on transcription in lymphoblastoid cell lines from public data
journal, November 2019
- Spirito, Giovanni; Mangoni, Damiano; Sanges, Remo
- BMC Bioinformatics, Vol. 20, Issue S9
GOOGA: A platform to synthesize mapping experiments and identify genomic structural diversity
journal, April 2019
- Flagel, Lex E.; Blackman, Benjamin K.; Fishman, Lila
- PLOS Computational Biology, Vol. 15, Issue 4
QTL-mapping and genomic prediction for bovine respiratory disease in U.S. Holsteins using sequence imputation and feature selection
journal, July 2019
- Hoff, Jesse L.; Decker, Jared E.; Schnabel, Robert D.
- BMC Genomics, Vol. 20, Issue 1
Pervasive Linked Selection and Intermediate-Frequency Alleles Are Implicated in an Evolve-and-Resequencing Experiment of Drosophila simulans
journal, December 2018
- Kelly, John K.; Hughes, Kimberly A.
- Genetics, Vol. 211, Issue 3
Functional models in genome-wide selection
journal, October 2019
- Moura, Ernandes Guedes; Pamplona, Andrezza Kellen Alves; Balestre, Marcio
- PLOS ONE, Vol. 14, Issue 10
Genome-wide association study on legendre random regression coefficients for the growth and feed intake trajectory on Duroc Boars
journal, May 2015
- Howard, Jeremy T.; Jiao, Shihui; Tiezzi, Francesco
- BMC Genetics, Vol. 16, Issue 1
Sliding window haplotype approaches overcome single SNP analysis limitations in identifying genes for meat tenderness in Nelore cattle
journal, January 2019
- Braz, Camila U.; Taylor, Jeremy F.; Bresolin, Tiago
- BMC Genetics, Vol. 20, Issue 1
Demographic history and genomics of local adaptation in blue tit populations
posted_content, May 2020
- Perrier, Charles; Rougemont, Quentin; Charmantier, Anne
- bioRxiv
Parasitism drives host genome evolution: Insights from the Pasteuria ramosa - Daphnia magna system : BRIEF COMMUNICATION
journal, March 2017
- Bourgeois, Yann; Roulin, Anne C.; Müller, Kristina
- Evolution, Vol. 71, Issue 4
The genomic basis of adaptation to calcareous and siliceous soils in Arabidopsis lyrata
text, January 2018
- Guggisberg, Alessia; Liu, Xuanyu; Suter, Léonie
- ETH Zurich
The genomic basis of adaptation to calcareous and siliceous soils in Arabidopsis lyrata
journal, December 2018
- Guggisberg, Alessia; Liu, Xuanyu; Suter, Léonie
- Molecular Ecology, Vol. 27, Issue 24
The identification of novel regions for reproduction trait in Landrace and Large White pigs using a single step genome-wide association study
journal, December 2018
- Suwannasing, Rattikan; Duangjinda, Monchai; Boonkum, Wuttigrai
- Asian-Australasian Journal of Animal Sciences, Vol. 31, Issue 12
Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations
journal, September 2017
- Fuentes-Pardo, Angela P.; Ruzzante, Daniel E.
- Molecular Ecology, Vol. 26, Issue 20
A nested mixture model for genomic prediction using whole-genome SNP genotypes
journal, March 2018
- Zeng, Jian; Garrick, Dorian; Dekkers, Jack
- PLOS ONE, Vol. 13, Issue 3
Linkage disequilibrium clustering‐based approach for association mapping with tightly linked genomewide data
journal, May 2018
- Li, Zitong; Kemppainen, Petri; Rastas, Pasi
- Molecular Ecology Resources, Vol. 18, Issue 4
Genomic regions influencing intramuscular fat in divergently selected rabbit lines
journal, November 2019
- Sosa‐Madrid, Bolívar S.; Hernández, Pilar; Blasco, Agustín
- Animal Genetics, Vol. 51, Issue 1
Genome-wide association study of endo-parasite phenotypes using imputed whole-genome sequence data in dairy and beef cattle
journal, April 2019
- Twomey, Alan J.; Berry, Donagh P.; Evans, Ross D.
- Genetics Selection Evolution, Vol. 51, Issue 1
Exploring Evolutionary Relationships Across the Genome Using Topology Weighting
journal, March 2017
- Martin, Simon H.; Van Belleghem, Steven M.
- Genetics, Vol. 206, Issue 1
Exploring evolutionary relationships across the genome using topology weighting
journal, January 2017
- Martin, Simon H.; Van Belleghem, Steven M.
- Genetics
Genome-wide genetic structure and differentially selected regions among Landrace, Erhualian, and Meishan pigs using specific-locus amplified fragment sequencing
journal, August 2017
- Li, Zhen; Wei, Shengjuan; Li, Hejun
- Scientific Reports, Vol. 7, Issue 1
Consistent signatures of selection from genomic analysis of pairs of temporal and spatial Plasmodium falciparum populations from The Gambia
journal, June 2018
- Amambua-Ngwa, Alfred; Jeffries, David; Amato, Roberto
- Scientific Reports, Vol. 8, Issue 1
Exploring evolutionary relationships across the genome using topology weighting
journal, January 2017
- Martin, Simon H.; Van Belleghem, Steven M.
- Genetics
Application of a Bayesian dominance model improves power in quantitative trait genome-wide association analysis
journal, January 2017
- Bennewitz, Jörn; Edel, Christian; Fries, Ruedi
- Genetics Selection Evolution, Vol. 49, Issue 1
Fixed-length haplotypes can improve genomic prediction accuracy in an admixed dairy cattle population
journal, July 2017
- Hess, Melanie; Druet, Tom; Hess, Andrew
- Genetics Selection Evolution, Vol. 49, Issue 1
Genome-wide association study of endo-parasite phenotypes using imputed whole-genome sequence data in dairy and beef cattle
journal, April 2019
- Twomey, Alan J.; Berry, Donagh P.; Evans, Ross D.
- Genetics Selection Evolution, Vol. 51, Issue 1
Impact of polymorphic transposable elements on transcription in lymphoblastoid cell lines from public data
journal, November 2019
- Spirito, Giovanni; Mangoni, Damiano; Sanges, Remo
- BMC Bioinformatics, Vol. 20, Issue S9
Genome-wide association study on legendre random regression coefficients for the growth and feed intake trajectory on Duroc Boars
journal, May 2015
- Howard, Jeremy T.; Jiao, Shihui; Tiezzi, Francesco
- BMC Genetics, Vol. 16, Issue 1
QTL-mapping and genomic prediction for bovine respiratory disease in U.S. Holsteins using sequence imputation and feature selection
journal, July 2019
- Hoff, Jesse L.; Decker, Jared E.; Schnabel, Robert D.
- BMC Genomics, Vol. 20, Issue 1
Reaffirmation of known major genes and the identification of novel candidate genes associated with carcass-related metrics based on whole genome sequence within a large multi-breed cattle population
journal, September 2019
- Purfield, D. C.; Evans, R. D.; Berry, D. P.
- BMC Genomics, Vol. 20, Issue 1
Functional models in genome-wide selection
journal, October 2019
- Moura, Ernandes Guedes; Pamplona, Andrezza Kellen Alves; Balestre, Marcio
- PLOS ONE, Vol. 14, Issue 10
Exploring Evolutionary Relationships Across the Genome Using Topology Weighting
journal, March 2017
- Martin, Simon H.; Van Belleghem, Steven M.
- Genetics, Vol. 206, Issue 1
The identification of novel regions for reproduction trait in Landrace and Large White pigs using a single step genome-wide association study
journal, December 2018
- Suwannasing, Rattikan; Duangjinda, Monchai; Boonkum, Wuttigrai
- Asian-Australasian Journal of Animal Sciences, Vol. 31, Issue 12