skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Deeplasmid: deep learning accurately separates plasmids from bacterial chromosomes

Journal Article · · Nucleic Acids Research
DOI:https://doi.org/10.1093/nar/gkab1115· OSTI ID:1894067
 [1];  [2];  [2];  [3];  [4];  [4]; ORCiD logo [2]
  1. USDOE Joint Genome Institute (JGI), Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); San Jose State Univ., CA (United States)
  2. Hebrew Univ. of Jerusalem (Israel)
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
  4. USDOE Joint Genome Institute (JGI), Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Plasmids are mobile genetic elements that play a key role in microbial ecology and evolution by mediating horizontal transfer of important genes, such as antimicrobial resistance genes. Many microbial genomes have been sequenced by short read sequencers and have resulted in a mix of contigs that derive from plasmids or chromosomes. New tools that accurately identify plasmids are needed to elucidate new plasmid-borne genes of high biological importance. We have developed Deeplasmid, a deep learning tool for distinguishing plasmids from bacterial chromosomes based on the DNA sequence and its encoded biological data. It requires as input only assembled sequences generated by any sequencing platform and assembly algorithm and its runtime scales linearly with the number of assembled sequences. Deeplasmid achieves an AUC–ROC of over 89%, and it was more accurate than five other plasmid classification methods. Finally, as a proof of concept, we used Deeplasmid to predict new plasmids in the fish pathogen Yersinia ruckeri ATCC 29473 that has no annotated plasmids. Deeplasmid predicted with high reliability that a long assembled contig is part of a plasmid. Using long read sequencing we indeed validated the existence of a 102 kb long plasmid, demonstrating Deeplasmid's ability to detect novel plasmids.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER); Israeli Science Foundation; Israeli Ministry of Agriculture
Grant/Contract Number:
AC02-05CH11231; 1535/20; 3300/20; 12-12-0002
OSTI ID:
1894067
Journal Information:
Nucleic Acids Research, Vol. 50, Issue 3; ISSN 0305-1048
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (52)

DNA Features Viewer: a sequence annotation formatting and plotting library for Python journal July 2020
PlasmidSeeker: identification of known plasmids from bacterial whole genome sequencing reads journal April 2018
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences journal May 2006
Genome-Wide Experimental Determination of Barriers to Horizontal Gene Transfer journal November 2007
Whole-Genome Yersinia sp. Assemblies from 10 Diverse Strains journal October 2014
In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing journal April 2014
cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data journal August 2010
Mob/oriT, a mobilizable site-specific recombination system for unmarked genetic manipulation in Bacillus thuringiensis and Bacillus cereus journal June 2016
Characterization of a plasmid-specified pathway for catabolism of isopropylbenzene in Pseudomonas putida RE204 journal October 1986
Plasmid-Mediated Heavy Metal Resistances journal October 1988
Plasmid-Mediated Adhesion in Enteropathogenic Escherichia coli journal August 1983
An SOS Inhibitor that Binds to Free RecA Protein: The PsiB Protein journal October 2009
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins journal January 2007
The stb Operon Balances the Requirements for Vegetative Stability and Conjugative Transfer of Plasmid R388 journal May 2011
Sequence Determination of Burkholderia pseudomallei Strain NCTC 13392 Colony Morphology Variants journal December 2013
IMG: the integrated microbial genomes database and comparative analysis system journal December 2011
PLSDB: a resource of complete bacterial plasmids journal October 2018
Microreview: Type IV secretion systems: versatility and diversity in function: Diversity and versatility of the type IV secretion system journal July 2010
pYR4 From a Norwegian Isolate of Yersinia ruckeri Is a Putative Virulence Plasmid Encoding Both a Type IV Pilus and a Type IV Secretion System journal October 2018
Control of genes for conjugative transfer of plasmids and other mobile elements journal February 1998
Plasmid detection and assembly in genomic and metagenomic data sets journal May 2019
Mobility of Plasmids journal August 2010
Broad-host-range properties of plasmid RK2: importance of overlapping genes encoding the plasmid replication initiation protein TrfA journal September 1991
The Pseudomonas aeruginosa Pathogenicity Island PAPI-1 Is Transferred via a Novel Type IV Pilus journal July 2010
Toxins, Targets, and Triggers: An Overview of Toxin-Antitoxin Biology journal June 2018
The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities journal October 2020
Toxin Plasmids of Clostridium perfringens journal June 2013
ACLAME: A CLAssification of Mobile genetic Elements, update 2010 journal November 2009
PLACNETw: a web-based tool for plasmid reconstruction from bacterial genomes journal July 2017
Introducing the bacterial ‘chromid’: not a chromosome, not a plasmid journal April 2010
Genome Sequence of Burkholderia pseudomallei NCTC 13392 journal June 2013
Post-transcriptional control of expression of the repA gene of plasmid R1 mediated by a small RNA molecule. journal January 1983
Quinolone resistance from a transferable plasmid journal March 1998
Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes journal May 2020
The diversity of conjugative relaxases and its application in plasmid classification journal May 2009
Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation journal March 2017
Interactions of the origin of replication (oriV) and initiation proteins (TrfA) of plasmid RK2 with submembrane domains of Escherichia coli journal December 1995
Plasmid-directed mechanisms for bacteriophage defense in lactic streptococci journal September 1987
Minimap2: pairwise alignment for nucleotide sequences journal May 2018
Learning to Forget: Continual Prediction with LSTM journal October 2000
HyAsP, a greedy tool for plasmids identification journal May 2019
Mode of infection, nodulation specificity, and indigenous plasmids of 11 fast-growing Rhizobium japonicum strains journal December 1984
Plasmid required for virulence of Agrobacterium tumefaciens journal July 1975
PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures journal January 2018
A vast collection of microbial genes that are toxic to bacteria journal February 2012
Ordering the mob: Insights into replicon and MOB typing schemes from analysis of a curated dataset of publicly available plasmids journal May 2017
IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes journal October 2018
A computational model for TensorFlow: an introduction conference June 2017
Toxin–antitoxin systems: Biology, identification, and application journal September 2013
Plasmid and chromosome partitioning: surprises from phylogeny: Phylogeny of partitioning ATPases journal August 2000
Base composition bias might result from competition for metabolic resources journal June 2002
Mechanism of plasmid-mediated quinolone resistance journal April 2002

Similar Records

Classification of bacterial plasmid and chromosome derived sequences using machine learning
Journal Article · Fri Dec 16 00:00:00 EST 2022 · PLoS ONE · OSTI ID:1894067

SourceFinder: a Machine-Learning-Based Tool for Identification of Chromosomal, Plasmid, and Bacteriophage Sequences from Assemblies
Journal Article · Tue Nov 15 00:00:00 EST 2022 · Microbiology Spectrum · OSTI ID:1894067

Complete set of eleven region-specific microdissection libraries for human chromosome 2
Journal Article · Mon Jan 01 00:00:00 EST 1996 · Somatic Cell and Molecular Genetics · OSTI ID:1894067