LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning
Journal Article
·
· ACS Synthetic Biology
- California Institute of Technology (CalTech), Pasadena, CA (United States)
- California Institute of Technology (CalTech), Pasadena, CA (United States); ETH Zurich, Basel (Switzerland)
- California Institute of Technology (CalTech), Pasadena, CA (United States); Merck & Co., Inc., South San Francisco, CA (United States)
Sequence-function data provides valuable information about the protein functional landscape but is rarely obtained during directed evolution campaigns. Here, we present Long-read every variant Sequencing (LevSeq), a pipeline that combines a dual barcoding strategy with nanopore sequencing to rapidly generate sequence-function data for entire protein-coding genes. LevSeq integrates into existing protein engineering workflows and comes with open-source software for data analysis and visualization. The pipeline facilitates data-driven protein engineering by consolidating sequence-function data to inform directed evolution and provide the requisite data for machine learning-guided protein engineering (MLPE). LevSeq enables quality control of mutagenesis libraries prior to screening, which reduces time and resource costs. Simulation studies demonstrate LevSeq’s ability to accurately detect variants under various experimental conditions. Lastly, we show LevSeq’s utility in engineering protoglobins for new-to-nature chemistry. Widespread adoption of LevSeq and sharing of the data will enhance our understanding of protein sequence-function landscapes and empower data-driven directed evolution.
- Research Organization:
- California Institute of Technology (CalTech), Pasadena, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Basic Energy Sciences (BES)
- Grant/Contract Number:
- SC0022218
- OSTI ID:
- 2567050
- Journal Information:
- ACS Synthetic Biology, Journal Name: ACS Synthetic Biology Journal Issue: 1 Vol. 14; ISSN 2161-5063
- Publisher:
- American Chemical Society (ACS)Copyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library
Enzyme Engineering Database (EnzEngDB): a platform for sharing and interpreting sequence–function relationships across protein engineering campaigns
Journal Article
·
Wed Feb 16 19:00:00 EST 2022
· ACS Synthetic Biology
·
OSTI ID:1853986
Enzyme Engineering Database (EnzEngDB): a platform for sharing and interpreting sequence–function relationships across protein engineering campaigns
Journal Article
·
Sun Dec 07 19:00:00 EST 2025
· Nucleic Acids Research
·
OSTI ID:3014245