skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Generalization of the Ewens sampling formula to arbitrary fitness landscapes

Abstract

In considering evolution of transcribed regions, regulatory sequences, and other genomic loci, we are often faced with a situation in which the number of allelic states greatly exceeds the size of the population. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from the population, do not change with time. In the absence of selection, the probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary fitness distributions. Although our approach is general, we focus on the class of fitness landscapes, inspired by recent high throughput genotype-phenotype maps, in which alleles can be in several distinct phenotypic states. This class of landscapes yields sampling probabilities that are computationally more tractable and can form a basis for inference of selection signatures from genomic data. Using an efficient numerical implementation of the sampling probabilities, we demonstrate that, for a sizable range of mutation rates and selection coefficients, the steady-statemore » allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other evolutionary parameters from population data. We also carry out numerical simulations to challenge various approximations involved in deriving our sampling formulas, such as the infinite-allele limit and the “full connectivity” assumption inherent in the Ewens theory, in which each allele can mutate into any other allele. We find that, at least for the specific numerical examples studied, our theory remains sufficiently accurate even if these assumptions are relaxed. Thus our framework establishes both theoretical and practical foundations for inferring selection signatures from population-level genomic sequence samples.« less

Authors:
ORCiD logo; ; ORCiD logo;
Publication Date:
Research Org.:
Rutgers Univ., Piscataway, NJ (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1416649
Alternate Identifier(s):
OSTI ID: 1627847
Grant/Contract Number:  
LANL-DOE 20150236ER; FG02-00ER41132; (LANL-DOE 20150236ER)
Resource Type:
Published Article
Journal Name:
PLoS ONE
Additional Journal Information:
Journal Name: PLoS ONE Journal Volume: 13 Journal Issue: 1; Journal ID: ISSN 1932-6203
Publisher:
Public Library of Science (PLoS)
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Science & Technology - Other Topics

Citation Formats

Khromov, Pavel, Malliaris, Constantin D., Morozov, Alexandre V., and Rutherford, ed., Suzannah. Generalization of the Ewens sampling formula to arbitrary fitness landscapes. United States: N. p., 2018. Web. https://doi.org/10.1371/journal.pone.0190186.
Khromov, Pavel, Malliaris, Constantin D., Morozov, Alexandre V., & Rutherford, ed., Suzannah. Generalization of the Ewens sampling formula to arbitrary fitness landscapes. United States. https://doi.org/10.1371/journal.pone.0190186
Khromov, Pavel, Malliaris, Constantin D., Morozov, Alexandre V., and Rutherford, ed., Suzannah. Thu . "Generalization of the Ewens sampling formula to arbitrary fitness landscapes". United States. https://doi.org/10.1371/journal.pone.0190186.
@article{osti_1416649,
title = {Generalization of the Ewens sampling formula to arbitrary fitness landscapes},
author = {Khromov, Pavel and Malliaris, Constantin D. and Morozov, Alexandre V. and Rutherford, ed., Suzannah},
abstractNote = {In considering evolution of transcribed regions, regulatory sequences, and other genomic loci, we are often faced with a situation in which the number of allelic states greatly exceeds the size of the population. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from the population, do not change with time. In the absence of selection, the probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary fitness distributions. Although our approach is general, we focus on the class of fitness landscapes, inspired by recent high throughput genotype-phenotype maps, in which alleles can be in several distinct phenotypic states. This class of landscapes yields sampling probabilities that are computationally more tractable and can form a basis for inference of selection signatures from genomic data. Using an efficient numerical implementation of the sampling probabilities, we demonstrate that, for a sizable range of mutation rates and selection coefficients, the steady-state allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other evolutionary parameters from population data. We also carry out numerical simulations to challenge various approximations involved in deriving our sampling formulas, such as the infinite-allele limit and the “full connectivity” assumption inherent in the Ewens theory, in which each allele can mutate into any other allele. We find that, at least for the specific numerical examples studied, our theory remains sufficiently accurate even if these assumptions are relaxed. Thus our framework establishes both theoretical and practical foundations for inferring selection signatures from population-level genomic sequence samples.},
doi = {10.1371/journal.pone.0190186},
journal = {PLoS ONE},
number = 1,
volume = 13,
place = {United States},
year = {2018},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1371/journal.pone.0190186

Save / Share:

Works referenced in this record:

The population structure associated with the Ewens sampling formula
journal, April 1977


Detecting and Measuring Selection from Gene Frequency Data
journal, December 2013


Experimental illumination of a fitness landscape
journal, April 2011

  • Hietpas, R. T.; Jensen, J. D.; Bolon, D. N. A.
  • Proceedings of the National Academy of Sciences, Vol. 108, Issue 19
  • DOI: 10.1073/pnas.1016024108

Convergence to Fleming-Viot processes in the weak atomic topology
journal, November 1994


Network of epistatic interactions within a yeast snoRNA
journal, April 2016


When can one detect overdominant selection in the infinite-alleles model?
journal, January 2003

  • Joyce, Paul; Krone, Stephen M.; Kurtz, Thomas G.
  • The Annals of Applied Probability, Vol. 13, Issue 1
  • DOI: 10.1214/aoap/1042765666

The fitness landscape of a tRNA gene
journal, April 2016


The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus
journal, May 2004

  • Sanjuan, R.; Moya, A.; Elena, S. F.
  • Proceedings of the National Academy of Sciences, Vol. 101, Issue 22
  • DOI: 10.1073/pnas.0400146101

Sampling Formulae for Symmetric Selection
journal, January 2005


Local fitness landscape of the green fluorescent protein
journal, May 2016

  • Sarkisyan, Karen S.; Bolotin, Dmitry A.; Meer, Margarita V.
  • Nature, Vol. 533, Issue 7603
  • DOI: 10.1038/nature17995

The distribution of fitness effects of new mutations
journal, August 2007

  • Eyre-Walker, Adam; Keightley, Peter D.
  • Nature Reviews Genetics, Vol. 8, Issue 8
  • DOI: 10.1038/nrg2146

The Ubiquitous Ewens Sampling Formula
journal, February 2016


Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints
journal, March 2017

  • Chan, Yvonne H.; Venev, Sergey V.; Zeldovich, Konstantin B.
  • Nature Communications, Vol. 8, Issue 1
  • DOI: 10.1038/ncomms14614

Allele frequencies with genic selection
journal, May 1983

  • Griffiths, R. C.
  • Journal of Mathematical Biology, Vol. 17, Issue 1
  • DOI: 10.1007/BF00276111

An exact test for neutrality based on the Ewens sampling distribution
journal, August 1994


The distribution of rare alleles
journal, June 1995

  • Joyce, Paul; Tavar�, Simon
  • Journal of Mathematical Biology, Vol. 33, Issue 6
  • DOI: 10.1007/BF00298645

Pervasive degeneracy and epistasis in a protein-protein interface
journal, February 2015


Mutational Robustness of Ribosomal Protein Genes
journal, November 2010


On Information and Sufficiency
journal, March 1951

  • Kullback, S.; Leibler, R. A.
  • The Annals of Mathematical Statistics, Vol. 22, Issue 1
  • DOI: 10.1214/aoms/1177729694

Bayesian Inference of Natural Selection from Allele Frequency Time Series
journal, March 2016


Maintenance of genetic variability under mutation and selection pressures in a finite population.
journal, June 1977


Transition between Stochastic Evolution and Deterministic Evolution in the Presence of Selection: General Theory and Application to Virology
journal, March 2001


Pervasive Cryptic Epistasis in Molecular Evolution
journal, October 2010


Estimating the Parameters of Selection on Nonsynonymous Mutations in Drosophila pseudoobscura and D. miranda
journal, June 2010


Stochastic Processes and Distribution of gene Frequencies Under Natural Selection
journal, January 1955


Learning Natural Selection from the Site Frequency Spectrum
journal, June 2013


The Biochemical Architecture of an Ancient Adaptive Landscape
journal, October 2005


The Distribution of Gene Frequencies in Populations
journal, June 1937

  • Wright, S.
  • Proceedings of the National Academy of Sciences, Vol. 23, Issue 6
  • DOI: 10.1073/pnas.23.6.307

Approximate Ewens formulae for symmetric overdominance selection
journal, May 2002

  • Grote, Mark N.; Speed, Terence P.
  • The Annals of Applied Probability, Vol. 12, Issue 2
  • DOI: 10.1214/aoap/1026915619

The sampling theory of selectively neutral alleles
journal, March 1972


Epistasis as the primary factor in molecular evolution
journal, October 2012

  • Breen, Michael S.; Kemena, Carsten; Vlasov, Peter K.
  • Nature, Vol. 490, Issue 7421
  • DOI: 10.1038/nature11510

Ewens sampling formulae with and without selection
journal, September 2007


Efficient Simulation and Likelihood Methods for Non-Neutral Multi-Allele Models
journal, June 2012

  • Joyce, Paul; Genz, Alan; Buzbas, Erkan Ozge
  • Journal of Computational Biology, Vol. 19, Issue 6
  • DOI: 10.1089/cmb.2012.0033

An exact steady state solution of Fisher’s geometric model and other models
journal, February 2009


Identifying Signatures of Selection in Genetic Time Series
journal, December 2013


The application of statistical physics to evolutionary biology
journal, June 2005

  • Sella, G.; Hirsh, A. E.
  • Proceedings of the National Academy of Sciences, Vol. 102, Issue 27
  • DOI: 10.1073/pnas.0501865102

The structure of allelic diversity in the presence of purifying selection
journal, March 2012

  • Desai, Michael M.; Nicolaisen, Lauren E.; Walczak, Aleksandra M.
  • Theoretical Population Biology, Vol. 81, Issue 2
  • DOI: 10.1016/j.tpb.2011.12.002

Exploring protein fitness landscapes by directed evolution
journal, December 2009

  • Romero, Philip A.; Arnold, Frances H.
  • Nature Reviews Molecular Cell Biology, Vol. 10, Issue 12
  • DOI: 10.1038/nrm2805

Robustness of the Ewens sampling formula
journal, September 1995


Pervasive Natural Selection in the Drosophila Genome?
journal, June 2009


Frequency spectra of neutral and deleterious alleles in a finite population
journal, October 1980

  • Ewens, W. J.; Li, W. -H.
  • Journal of Mathematical Biology, Vol. 10, Issue 2
  • DOI: 10.1007/BF00275839

Diffusion Processes and the Ewens Sampling Formula
journal, February 2016


Random processes in genetics
journal, January 1958

  • Moran, P. A. P.
  • Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 54, Issue 1
  • DOI: 10.1017/S0305004100033193

Natural Selection and Genetic Diversity in the Butterfly Heliconius melpomene
journal, March 2016