Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library

Journal Article · · ACS Synthetic Biology
Widespread availability of protein sequence-fitness data would revolutionize both our biochemical understanding of proteins and our ability to engineer them. Unfortunately, even though thousands of protein variants are generated and evaluated for fitness during a typical protein engineering campaign, most are never sequenced, leaving a wealth of potential sequence-fitness information untapped. Primarily, this is because sequencing is unnecessary for many protein engineering strategies; the added cost and effort of sequencing is thus unjustified. It also results from the fact that, even though many lower cost sequencing strategies have been developed, they often require at least some sequencing or computational resources, both of which can be barriers to access. In this work, we present every variant sequencing (evSeq), a method and collection of tools/standardized components for sequencing a variable region within every variant gene produced during a protein engineering campaign at a cost of cents per variant. evSeq was designed to democratize low-cost sequencing for protein engineers and, indeed, anyone interested in engineering biological systems. Execution of its wet-lab component is simple, requires no sequencing experience to perform, relies only on resources and services typically available to biology labs, and slots neatly into existing protein engineering workflows. Analysis of evSeq data is likewise made simple by its accompanying software (found at github.com/fhalab/evSeq, documentation at fhalab.github.io/evSeq), which can be run on a personal laptop and was designed to be accessible to users with no computational experience. Here, low-cost and easy to use, evSeq makes collection of extensive protein variant sequence-fitness data practical.
Research Organization:
California Institute of Technology (CalTech), Pasadena, CA (United States)
Sponsoring Organization:
National Science Foundation (NSF); USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
SC0022218
OSTI ID:
1853986
Alternate ID(s):
OSTI ID: 1855942
Journal Information:
ACS Synthetic Biology, Journal Name: ACS Synthetic Biology Journal Issue: 3 Vol. 11; ISSN 2161-5063
Publisher:
American Chemical Society (ACS)Copyright Statement
Country of Publication:
United States
Language:
English

References (42)

Can Machine Learning Revolutionize Directed Evolution of Selective Enzymes? journal April 2019
ProtaBank: A repository for protein design and engineering data: ProtaBank: A Protein Engineering Database journal April 2018
ImageNet Large Scale Visual Recognition Challenge journal April 2015
A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase journal May 1975
Site saturation mutagenesis: Methods and applications in protein engineering journal July 2012
Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data journal January 2018
Informed training set design enables efficient machine learning-assisted directed protein evolution journal August 2021
Advances in machine learning for directed evolution journal August 2021
Combinatorial alanine-scanning journal June 2001
Improved Synthesis of 4-Cyanotryptophan and Other Tryptophan Analogues in Aqueous Solvent Using Variants of TrpB from Thermotoga maritima journal April 2018
Machine Learning in Enzyme Engineering journal December 2019
uPIC–M: Efficient and Scalable Preparation of Clonal Single Mutant Libraries for High-Throughput Protein Biochemistry journal November 2021
Unlocking Reactivity of TrpB: A General Biocatalytic Platform for Synthesis of Tryptophan Analogues journal July 2017
Reducing Codon Redundancy and Screening Effort of Combinatorial Protein Libraries Created by Saturation Mutagenesis journal June 2012
Mutation effects predicted from sequence co-variation journal January 2017
Enzymatic assembly of DNA molecules up to several hundred kilobases journal April 2009
Deep mutational scanning: a new style of protein science journal July 2014
Sequencing technologies — the next generation journal December 2009
Sequencing depth and coverage: key considerations in genomic analyses journal January 2014
Scalable continuous evolution for the generation of diverse enzyme variants encompassing promiscuous activities journal November 2020
Learning protein fitness models from evolutionary and assay-labeled data journal January 2022
Deep generative models of genetic variation capture the effects of mutations journal September 2018
Machine-learning-guided directed evolution for protein engineering journal July 2019
Directed evolution of the tryptophan synthase β-subunit for stand-alone function recapitulates allosteric activation journal November 2015
Machine learning-assisted directed protein evolution with combinatorial libraries journal April 2019
A massively parallel barcoded sequencing pipeline enables generation of the first ORFeome and interactome map for rice journal May 2020
Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples journal May 2010
Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes journal June 2014
Language models enable zero-shot prediction of the effects of mutations on protein function preprint November 2021
Genotyping-in-Thousands by sequencing (GT-seq): A cost effective SNP genotyping method based on custom amplicon sequencing journal December 2014
Pervasive degeneracy and epistasis in a protein-protein interface journal February 2015
Mining and Statistical Modeling of Natural and Variant Class IIa Bacteriocins Elucidate Activity and Selectivity Profiles across Species journal October 2020
Error Rate Comparison during Polymerase Chain Reaction by DNA Polymerase journal January 2014
ONTbarcoder and MinION barcodes aid biodiversity discovery and identification by everyone, for everyone journal September 2021
A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform journal July 2017
FREQ-Seq: A Rapid, Cost-Effective, Sequencing-Based Method to Determine Allele Frequencies Directly from Mixed Populations journal October 2012
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform journal April 2017
A framework for exhaustively mapping functional missense variants journal December 2017
Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations journal July 2020
MinION barcodes: biodiversity discovery and identification by everyone, for everyone dataset January 2021
Adaptation in protein fitness landscapes is facilitated by indirect paths journal July 2016
Adapterama I: universal stubs and primers for 384 unique dual-indexed or 147,456 combinatorially-indexed Illumina libraries (iTru & iNext) journal January 2019