Deeplasmid: deep learning accurately separates plasmids from bacterial chromosomes
Journal Article
·
· Nucleic Acids Research
- USDOE Joint Genome Institute (JGI), Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); San Jose State Univ., CA (United States)
- Hebrew Univ. of Jerusalem (Israel)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
- USDOE Joint Genome Institute (JGI), Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Plasmids are mobile genetic elements that play a key role in microbial ecology and evolution by mediating horizontal transfer of important genes, such as antimicrobial resistance genes. Many microbial genomes have been sequenced by short read sequencers and have resulted in a mix of contigs that derive from plasmids or chromosomes. New tools that accurately identify plasmids are needed to elucidate new plasmid-borne genes of high biological importance. We have developed Deeplasmid, a deep learning tool for distinguishing plasmids from bacterial chromosomes based on the DNA sequence and its encoded biological data. It requires as input only assembled sequences generated by any sequencing platform and assembly algorithm and its runtime scales linearly with the number of assembled sequences. Deeplasmid achieves an AUC–ROC of over 89%, and it was more accurate than five other plasmid classification methods. Finally, as a proof of concept, we used Deeplasmid to predict new plasmids in the fish pathogen Yersinia ruckeri ATCC 29473 that has no annotated plasmids. Deeplasmid predicted with high reliability that a long assembled contig is part of a plasmid. Using long read sequencing we indeed validated the existence of a 102 kb long plasmid, demonstrating Deeplasmid's ability to detect novel plasmids.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- Israeli Ministry of Agriculture; Israeli Science Foundation; USDOE Office of Science (SC), Biological and Environmental Research (BER)
- Grant/Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1894067
- Journal Information:
- Nucleic Acids Research, Journal Name: Nucleic Acids Research Journal Issue: 3 Vol. 50; ISSN 0305-1048
- Publisher:
- Oxford University PressCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Classification of bacterial plasmid and chromosome derived sequences using machine learning
SourceFinder: a Machine-Learning-Based Tool for Identification of Chromosomal, Plasmid, and Bacteriophage Sequences from Assemblies
Integration of Complete Plasmids Containing Bont Genes into Chromosomes of Clostridium parabotulinum, Clostridium sporogenes, and Clostridium argentinense
Journal Article
·
Thu Dec 15 19:00:00 EST 2022
· PLoS ONE
·
OSTI ID:2320222
SourceFinder: a Machine-Learning-Based Tool for Identification of Chromosomal, Plasmid, and Bacteriophage Sequences from Assemblies
Journal Article
·
Mon Nov 14 19:00:00 EST 2022
· Microbiology Spectrum
·
OSTI ID:1984973
Integration of Complete Plasmids Containing Bont Genes into Chromosomes of Clostridium parabotulinum, Clostridium sporogenes, and Clostridium argentinense
Journal Article
·
Wed Jul 07 20:00:00 EDT 2021
· Toxins
·
OSTI ID:1815750