DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Defining window-boundaries for genomic analyses using smoothing spline techniques

Abstract

High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome.

Authors:
 [1];  [2];  [3];  [4];  [3]
  1. Univ. of California, Davis, CA (United States). Dept. Plant Sciences.
  2. Univ. of Wisconsin, Madison, WI (United States). Dept. of Animal Sciences and Dept. of Biostatistics and Medical Information.
  3. Univ. of Wisconsin, Madison, WI (United States). Dept. of Agronomy and Dept. of Energy Great Lakes Bioenergy Research Center.
  4. Univ. of Wisconsin, Madison, WI (United States). Dept. of Animal Sciences, Dept. of Biostatistics and Medical Information and Dept. of Dairy Science.
Publication Date:
Research Org.:
Univ. of Wisconsin, Madison, WI (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1184786
Grant/Contract Number:  
FC02-07ER64494
Resource Type:
Accepted Manuscript
Journal Name:
Genetics Selection Evolution (Online)
Additional Journal Information:
Journal Name: Genetics Selection Evolution (Online); Journal Volume: 47; Journal Issue: 1; Journal ID: ISSN 1297-9686
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Beissinger, Timothy M., Rosa, Guilherme J.M., Kaeppler, Shawn M., Gianola, Daniel, and de Leon, Natalia. Defining window-boundaries for genomic analyses using smoothing spline techniques. United States: N. p., 2015. Web. doi:10.1186/s12711-015-0105-9.
Beissinger, Timothy M., Rosa, Guilherme J.M., Kaeppler, Shawn M., Gianola, Daniel, & de Leon, Natalia. Defining window-boundaries for genomic analyses using smoothing spline techniques. United States. https://doi.org/10.1186/s12711-015-0105-9
Beissinger, Timothy M., Rosa, Guilherme J.M., Kaeppler, Shawn M., Gianola, Daniel, and de Leon, Natalia. Fri . "Defining window-boundaries for genomic analyses using smoothing spline techniques". United States. https://doi.org/10.1186/s12711-015-0105-9. https://www.osti.gov/servlets/purl/1184786.
@article{osti_1184786,
title = {Defining window-boundaries for genomic analyses using smoothing spline techniques},
author = {Beissinger, Timothy M. and Rosa, Guilherme J.M. and Kaeppler, Shawn M. and Gianola, Daniel and de Leon, Natalia},
abstractNote = {High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome.},
doi = {10.1186/s12711-015-0105-9},
journal = {Genetics Selection Evolution (Online)},
number = 1,
volume = 47,
place = {United States},
year = {Fri Apr 17 00:00:00 EDT 2015},
month = {Fri Apr 17 00:00:00 EDT 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 55 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Detecting recent positive selection in the human genome from haplotype structure
journal, October 2002

  • Sabeti, Pardis C.; Reich, David E.; Higgins, John M.
  • Nature, Vol. 419, Issue 6909
  • DOI: 10.1038/nature01140

A Genome-Wide Scan for Evidence of Selection in a Maize Population Under Long-Term Artificial Selection for Ear Number
journal, December 2013


Use of locally weighted scatterplot smoothing (LOWESS) regression to study selection signatures in Piedmontese and Italian Brown cattle breeds
journal, July 2013

  • Pintus, Elia; Sorbolini, Silvia; Albera, Andrea
  • Animal Genetics, Vol. 45, Issue 1
  • DOI: 10.1111/age.12076

Integration of association statistics over genomic regions using Bayesian adaptive regression splines
journal, November 2003


Genome-wide analysis of a long-term evolution experiment with Drosophila
journal, September 2010

  • Burke, Molly K.; Dunham, Joseph P.; Shahrestani, Parvin
  • Nature, Vol. 467, Issue 7315
  • DOI: 10.1038/nature09352

QMSim: a large-scale genome simulator for livestock
journal, January 2009


Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
journal, July 2012


Tracking footprints of artificial selection in the dog genome
journal, January 2010

  • Akey, J. M.; Ruhe, A. L.; Akey, D. T.
  • Proceedings of the National Academy of Sciences, Vol. 107, Issue 3
  • DOI: 10.1073/pnas.0909918107

The Genomic Signal of Partial Sweeps in Mimulus guttatus
journal, July 2013

  • Kelly, John K.; Koseva, Boryana; Mojica, Julius P.
  • Genome Biology and Evolution, Vol. 5, Issue 8
  • DOI: 10.1093/gbe/evt100

Genome-Wide Footprints of Pig Domestication and Selection Revealed through Massive Parallel Sequencing of Pooled DNA
journal, April 2011


Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry
journal, January 2013

  • Hider, Jessica L.; Gittelman, Rachel M.; Shah, Tapan
  • BMC Evolutionary Biology, Vol. 13, Issue 1
  • DOI: 10.1186/1471-2148-13-150

Whole-genome resequencing reveals loci under selection during chicken domestication
journal, March 2010

  • Rubin, Carl-Johan; Zody, Michael C.; Eriksson, Jonas
  • Nature, Vol. 464, Issue 7288
  • DOI: 10.1038/nature08832

Spline models for observational data
journal, September 1991


Genome-Wide Effects of Long-Term Divergent Selection
journal, November 2010


The Genomic Signal of Partial Sweeps in Mimulus guttatus
journal, January 2013

  • Kelly, J. K.; Koseva, B.; Mojica, J. P.
  • Genome Biology and Evolution, Vol. 5, Issue 9
  • DOI: 10.1093/gbe/evt130

LDx: Estimation of Linkage Disequilibrium from High-Throughput Pooled Resequencing Data
journal, November 2012


ESTIMATING F -STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE
journal, November 1984


Population-Based Resequencing of Experimentally Evolved Populations Reveals the Genetic Basis of Body Size Variation in Drosophila melanogaster
journal, March 2011


Smoothing by spline functions
journal, October 1967


The spread of a gene in natural conditions in a colony of the moth Panaxia dominula L.
journal, October 1947


The hitch-hiking effect of a favourable gene
journal, February 1974


Estimating F-Statistics for the Analysis of Population Structure
journal, November 1984

  • Weir, B. S.; Cockerham, C. Clark
  • Evolution, Vol. 38, Issue 6
  • DOI: 10.2307/2408641

Constructing genomic maps of positive selection in humans: Where do we go from here?
journal, May 2009


Smoothing noisy data with spline functions
journal, March 1985

  • Hutchinson, M. F.; de Hoog, F. R.
  • Numerische Mathematik, Vol. 47, Issue 1
  • DOI: 10.1007/bf01389878

A High Resolution Genome-Wide Scan for Significant Selective Sweeps: An Application to Pooled Sequence Data in Laying Chickens
journal, November 2012


LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data
text, January 2012


Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation
journal, December 1978

  • Craven, Peter; Wahba, Grace
  • Numerische Mathematik, Vol. 31, Issue 4
  • DOI: 10.1007/bf01404567

The spread of a gene in natural conditions in a colony of the moth Panaxia dominula L.
journal, October 1947


Detecting recent positive selection in the human genome from haplotype structure
journal, October 2002

  • Sabeti, Pardis C.; Reich, David E.; Higgins, John M.
  • Nature, Vol. 419, Issue 6909
  • DOI: 10.1038/nature01140

Whole-genome resequencing reveals loci under selection during chicken domestication
journal, March 2010

  • Rubin, Carl-Johan; Zody, Michael C.; Eriksson, Jonas
  • Nature, Vol. 464, Issue 7288
  • DOI: 10.1038/nature08832

Genome-wide analysis of a long-term evolution experiment with Drosophila
journal, September 2010

  • Burke, Molly K.; Dunham, Joseph P.; Shahrestani, Parvin
  • Nature, Vol. 467, Issue 7315
  • DOI: 10.1038/nature09352

Tracking footprints of artificial selection in the dog genome
journal, January 2010

  • Akey, J. M.; Ruhe, A. L.; Akey, D. T.
  • Proceedings of the National Academy of Sciences, Vol. 107, Issue 3
  • DOI: 10.1073/pnas.0909918107

QMSim: a large-scale genome simulator for livestock
journal, January 2009


The Genomic Signal of Partial Sweeps in Mimulus guttatus
journal, January 2013

  • Kelly, J. K.; Koseva, B.; Mojica, J. P.
  • Genome Biology and Evolution, Vol. 5, Issue 9
  • DOI: 10.1093/gbe/evt130

Constructing genomic maps of positive selection in humans: Where do we go from here?
journal, May 2009


Use of locally weighted scatterplot smoothing (LOWESS) regression to study selection signatures in Piedmontese and Italian Brown cattle breeds
journal, July 2013

  • Pintus, Elia; Sorbolini, Silvia; Albera, Andrea
  • Animal Genetics, Vol. 45, Issue 1
  • DOI: 10.1111/age.12076

Identification and Analysis of Genomic Regions with Large Between-Population Differentiation in Humans
journal, August 2007


Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry
journal, January 2013

  • Hider, Jessica L.; Gittelman, Rachel M.; Shah, Tapan
  • BMC Evolutionary Biology, Vol. 13, Issue 1
  • DOI: 10.1186/1471-2148-13-150

Integration of association statistics over genomic regions using Bayesian adaptive regression splines
journal, November 2003


Genome-Wide Effects of Long-Term Divergent Selection
journal, November 2010


Population-Based Resequencing of Experimentally Evolved Populations Reveals the Genetic Basis of Body Size Variation in Drosophila melanogaster
journal, March 2011


Genome-Wide Footprints of Pig Domestication and Selection Revealed through Massive Parallel Sequencing of Pooled DNA
journal, April 2011


Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
journal, July 2012


LDx: Estimation of Linkage Disequilibrium from High-Throughput Pooled Resequencing Data
journal, November 2012


A Genome-Wide Scan for Evidence of Selection in a Maize Population Under Long-Term Artificial Selection for Ear Number
journal, December 2013


Estimating F-Statistics for the Analysis of Population Structure
journal, November 1984

  • Weir, B. S.; Cockerham, C. Clark
  • Evolution, Vol. 38, Issue 6
  • DOI: 10.2307/2408641

Works referencing / citing this record:

Exploring Evolutionary Relationships Across the Genome Using Topology Weighting.
text, January 2017

  • Martin, Simon; Van Belleghem, Steven
  • Apollo - University of Cambridge Repository
  • DOI: 10.17863/cam.11174

Fixed-length haplotypes can improve genomic prediction accuracy in an admixed dairy cattle population
journal, July 2017


A Nested Mixture Model for Genomic Prediction Using Whole-Genome SNP Genotypes
report, January 2016


Drosophila simulans : A Species with Improved Resolution in Evolve and Resequence Studies
journal, May 2017

  • Barghi, Neda; Tobler, Raymond; Nolte, Viola
  • G3: Genes|Genomes|Genetics, Vol. 7, Issue 7
  • DOI: 10.1534/g3.117.043349

Cassava haplotype map highlights fixation of deleterious mutations during clonal propagation
journal, April 2017

  • Ramu, Punna; Esuma, Williams; Kawuki, Robert
  • Nature Genetics, Vol. 49, Issue 6
  • DOI: 10.1038/ng.3845

Impact of polymorphic transposable elements on transcription in lymphoblastoid cell lines from public data
journal, November 2019


GOOGA: A platform to synthesize mapping experiments and identify genomic structural diversity
journal, April 2019


QTL-mapping and genomic prediction for bovine respiratory disease in U.S. Holsteins using sequence imputation and feature selection
journal, July 2019


Functional models in genome-wide selection
journal, October 2019


Genome-wide association study on legendre random regression coefficients for the growth and feed intake trajectory on Duroc Boars
journal, May 2015


Sliding window haplotype approaches overcome single SNP analysis limitations in identifying genes for meat tenderness in Nelore cattle
journal, January 2019


Demographic history and genomics of local adaptation in blue tit populations
posted_content, May 2020

  • Perrier, Charles; Rougemont, Quentin; Charmantier, Anne
  • bioRxiv
  • DOI: 10.1101/864975

Parasitism drives host genome evolution: Insights from the Pasteuria ramosa - Daphnia magna system : BRIEF COMMUNICATION
journal, March 2017

  • Bourgeois, Yann; Roulin, Anne C.; Müller, Kristina
  • Evolution, Vol. 71, Issue 4
  • DOI: 10.1111/evo.13209

The genomic basis of adaptation to calcareous and siliceous soils in Arabidopsis lyrata
text, January 2018


The genomic basis of adaptation to calcareous and siliceous soils in Arabidopsis lyrata
journal, December 2018

  • Guggisberg, Alessia; Liu, Xuanyu; Suter, Léonie
  • Molecular Ecology, Vol. 27, Issue 24
  • DOI: 10.1111/mec.14930

The identification of novel regions for reproduction trait in Landrace and Large White pigs using a single step genome-wide association study
journal, December 2018

  • Suwannasing, Rattikan; Duangjinda, Monchai; Boonkum, Wuttigrai
  • Asian-Australasian Journal of Animal Sciences, Vol. 31, Issue 12
  • DOI: 10.5713/ajas.18.0072

Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations
journal, September 2017

  • Fuentes-Pardo, Angela P.; Ruzzante, Daniel E.
  • Molecular Ecology, Vol. 26, Issue 20
  • DOI: 10.1111/mec.14264

A nested mixture model for genomic prediction using whole-genome SNP genotypes
journal, March 2018


Linkage disequilibrium clustering‐based approach for association mapping with tightly linked genomewide data
journal, May 2018

  • Li, Zitong; Kemppainen, Petri; Rastas, Pasi
  • Molecular Ecology Resources, Vol. 18, Issue 4
  • DOI: 10.1111/1755-0998.12893

Genomic regions influencing intramuscular fat in divergently selected rabbit lines
journal, November 2019

  • Sosa‐Madrid, Bolívar S.; Hernández, Pilar; Blasco, Agustín
  • Animal Genetics, Vol. 51, Issue 1
  • DOI: 10.1111/age.12873

Genome-wide association study of endo-parasite phenotypes using imputed whole-genome sequence data in dairy and beef cattle
journal, April 2019

  • Twomey, Alan J.; Berry, Donagh P.; Evans, Ross D.
  • Genetics Selection Evolution, Vol. 51, Issue 1
  • DOI: 10.1186/s12711-019-0457-7

Exploring Evolutionary Relationships Across the Genome Using Topology Weighting
journal, March 2017


Exploring evolutionary relationships across the genome using topology weighting
journal, January 2017

  • Martin, Simon H.; Van Belleghem, Steven M.
  • Genetics
  • DOI: 10.1101/069112

Consistent signatures of selection from genomic analysis of pairs of temporal and spatial Plasmodium falciparum populations from The Gambia
journal, June 2018


Exploring evolutionary relationships across the genome using topology weighting
journal, January 2017

  • Martin, Simon H.; Van Belleghem, Steven M.
  • Genetics
  • DOI: 10.1101/069112

Application of a Bayesian dominance model improves power in quantitative trait genome-wide association analysis
journal, January 2017

  • Bennewitz, Jörn; Edel, Christian; Fries, Ruedi
  • Genetics Selection Evolution, Vol. 49, Issue 1
  • DOI: 10.1186/s12711-017-0284-7

Fixed-length haplotypes can improve genomic prediction accuracy in an admixed dairy cattle population
journal, July 2017


Genome-wide association study of endo-parasite phenotypes using imputed whole-genome sequence data in dairy and beef cattle
journal, April 2019

  • Twomey, Alan J.; Berry, Donagh P.; Evans, Ross D.
  • Genetics Selection Evolution, Vol. 51, Issue 1
  • DOI: 10.1186/s12711-019-0457-7

Impact of polymorphic transposable elements on transcription in lymphoblastoid cell lines from public data
journal, November 2019


Genome-wide association study on legendre random regression coefficients for the growth and feed intake trajectory on Duroc Boars
journal, May 2015


QTL-mapping and genomic prediction for bovine respiratory disease in U.S. Holsteins using sequence imputation and feature selection
journal, July 2019


Functional models in genome-wide selection
journal, October 2019


Exploring Evolutionary Relationships Across the Genome Using Topology Weighting
journal, March 2017


The identification of novel regions for reproduction trait in Landrace and Large White pigs using a single step genome-wide association study
journal, December 2018

  • Suwannasing, Rattikan; Duangjinda, Monchai; Boonkum, Wuttigrai
  • Asian-Australasian Journal of Animal Sciences, Vol. 31, Issue 12
  • DOI: 10.5713/ajas.18.0072