Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning

Journal Article · · ACS Synthetic Biology
 [1];  [1];  [1];  [2];  [3];  [1]
  1. California Institute of Technology (CalTech), Pasadena, CA (United States)
  2. California Institute of Technology (CalTech), Pasadena, CA (United States); ETH Zurich, Basel (Switzerland)
  3. California Institute of Technology (CalTech), Pasadena, CA (United States); Merck & Co., Inc., South San Francisco, CA (United States)
Sequence-function data provides valuable information about the protein functional landscape but is rarely obtained during directed evolution campaigns. Here, we present Long-read every variant Sequencing (LevSeq), a pipeline that combines a dual barcoding strategy with nanopore sequencing to rapidly generate sequence-function data for entire protein-coding genes. LevSeq integrates into existing protein engineering workflows and comes with open-source software for data analysis and visualization. The pipeline facilitates data-driven protein engineering by consolidating sequence-function data to inform directed evolution and provide the requisite data for machine learning-guided protein engineering (MLPE). LevSeq enables quality control of mutagenesis libraries prior to screening, which reduces time and resource costs. Simulation studies demonstrate LevSeq’s ability to accurately detect variants under various experimental conditions. Lastly, we show LevSeq’s utility in engineering protoglobins for new-to-nature chemistry. Widespread adoption of LevSeq and sharing of the data will enhance our understanding of protein sequence-function landscapes and empower data-driven directed evolution.
Research Organization:
California Institute of Technology (CalTech), Pasadena, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
SC0022218
OSTI ID:
2567050
Journal Information:
ACS Synthetic Biology, Journal Name: ACS Synthetic Biology Journal Issue: 1 Vol. 14; ISSN 2161-5063
Publisher:
American Chemical Society (ACS)Copyright Statement
Country of Publication:
United States
Language:
English

References (52)

Directed Evolution: Bringing New Chemistry to Life journal November 2017
Protoglobin‐Catalyzed Formation of cis‐Trifluoromethyl‐Substituted Cyclopropanes by Carbene Transfer journal December 2022
Data‐Driven Protein Engineering for Improving Catalytic Activity and Selectivity journal December 2023
Overview of Next-Generation Sequencing Technologies: Overview of Next-Generation Sequencing journal April 2018
Epistasis in protein evolution: Epistasis in Protein Evolution journal February 2016
ProtaBank: A repository for protein design and engineering data: ProtaBank: A Protein Engineering Database journal April 2018
Machine Learning-driven Protein Library Design: A Path Toward Smarter Libraries book April 2022
Mathematical expressions useful in the construction, description and evaluation of protein libraries journal June 2005
Informed training set design enables efficient machine learning-assisted directed protein evolution journal August 2021
Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing journal January 2023
Adaptive machine learning for protein engineering journal February 2022
Miniaturisation of high-throughput plasmid DNA library preparation for next-generation sequencing using multifactorial optimisation journal March 2019
Directed Evolution: Methodologies and Applications journal July 2021
Machine Learning-Guided Protein Engineering journal October 2023
Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering journal February 2024
Diverse Engineered Heme Proteins Enable Stereodivergent Cyclopropanation of Unactivated Alkenes journal February 2018
uPIC–M: Efficient and Scalable Preparation of Clonal Single Mutant Libraries for High-Throughput Protein Biochemistry journal November 2021
Engineering Sensor Proteins journal December 2019
evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library journal February 2022
DeCOIL: Optimization of Degenerate Codon Libraries for Machine Learning-Assisted Protein Engineering journal July 2023
DuBA.flow─A Low-Cost, Long-Read Amplicon Sequencing Workflow for the Validation of Synthetic DNA Constructs journal January 2024
Biocatalytic Carbene Transfer Using Diazirines journal May 2022
Enzymatic Nitrogen Incorporation Using Hydroxylamine journal September 2023
Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process journal April 2015
Natural Selection and the Concept of a Protein Space journal February 1970
Molecular evolution by staggered extension process (StEP) in vitro recombination journal March 1998
Deep mutational scanning: a new style of protein science journal July 2014
Sequencing depth and coverage: key considerations in genomic analyses journal January 2014
Exploring protein fitness landscapes by directed evolution journal December 2009
Learning the pattern of epistasis linking genotype and phenotype in a protein journal September 2019
UMI-linked consensus sequencing enables phylogenetic analysis of directed evolution journal November 2020
Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis journal January 2021
Nanopore sequencing technology, bioinformatics and applications journal November 2021
Machine-learning-guided directed evolution for protein engineering journal July 2019
Cluster learning-assisted directed evolution journal December 2021
A massively parallel barcoded sequencing pipeline enables generation of the first ORFeome and interactome map for rice journal May 2020
A combinatorially complete epistatic fitness landscape in an enzyme active site journal July 2024
Accurate gene consensus at low nanopore coverage journal November 2022
Rhea, the reaction knowledgebase in 2022 journal November 2021
UniProt: the Universal Protein Knowledgebase in 2023 journal November 2022
Arrayed in vivo barcoding for multiplexed sequence verification of plasmid DNA and demultiplexing of pooled libraries journal May 2024
Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples journal May 2010
BRENDA in 2019: a European ELIXIR core data resource journal November 2018
Highly multiplexed, fast and accurate nanopore sequencing for verification of synthetic DNA constructs and sequence libraries journal January 2019
Genotyping-in-Thousands by sequencing (GT-seq): A cost effective SNP genotyping method based on custom amplicon sequencing journal December 2014
Directed evolution of enzymatic silicon-carbon bond cleavage in siloxanes journal January 2024
ONTbarcoder and MinION barcodes aid biodiversity discovery and identification by everyone, for everyone journal September 2021
Sequencing DNA with nanopores: Troubles and biases journal October 2021
MAECI: A pipeline for generating consensus sequence with nanopore sequencing long-read assembly and error correction journal May 2022
Machine Learning for Protein Engineering preprint January 2023
Adaptation in protein fitness landscapes is facilitated by indirect paths journal July 2016
Expanding the Scope of Metalloprotein Families and Substrate Classes in New-to-Nature Reactions text January 2021