skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: PATtyFams: Protein families for the microbial genomes in the PATRIC database

Journal Article · · Frontiers in Microbiology
 [1];  [2];  [3];  [1];  [2];  [1];  [2];  [4];  [1]
  1. Univ. of Chicago, IL (United States); Argonne National Lab. (ANL), Argonne, IL (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States); Fellowship for Interpretation of Genomes, Burr Ridge, IL (United States)
  3. Univ. of Illinois at Urbana-Champaign, Urbana, IL (United States)
  4. Virginia Bioinformatics Institute, Virginia Tech University, Blacksburg,VA (United States)

The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC02-06CH11357; NNA13AA91A
OSTI ID:
1248167
Journal Information:
Frontiers in Microbiology, Vol. 7; ISSN 1664-302X
Publisher:
Frontiers Research FoundationCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 102 works
Citation information provided by
Web of Science

References (50)

The RAST Server: Rapid Annotations using Subsystems Technology journal January 2008
Likelihood-Based Gene Annotations for Gap Filling and Quality Assessment in Genome-Scale Metabolic Models journal October 2014
GenBank journal November 2012
Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) journal May 2002
RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes journal February 2015
BLAST+: architecture and applications journal January 2009
Toward Automatic Reconstruction of a Highly Resolved Tree of Life journal March 2006
In search of genome annotation consistency: solid gene clusters and how to use them journal July 2013
Measures of the Amount of Ecologic Association Between Species journal July 1945
Search and clustering orders of magnitude faster than BLAST journal August 2010
Real Time Metagenomics: Using k-mers to annotate metagenomes journal October 2012
An efficient algorithm for large-scale detection of protein families journal April 2002
Enabling comparative modeling of closely related genomes: example genus Brucella journal March 2014
Expanded microbial genome coverage and improved protein family annotation in the COG database journal November 2014
TIGRFAMs and Genome Properties in 2013 journal November 2012
Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach journal December 2011
kClust: fast and sensitive clustering of large protein sequence databases journal January 2013
High-throughput generation, optimization and analysis of genome-scale metabolic models journal August 2010
Selection of representative protein data sets journal March 1992
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes journal September 2003
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences journal May 2006
Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs journal December 2011
CDD: NCBI's conserved domain database journal November 2014
FIGfams: yet another set of protein families journal September 2009
PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees journal November 2012
The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes journal September 2005
The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) journal November 2013
Genome sequence of enterohaemorrhagic Escherichia coli O157:H7 journal January 2001
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments journal March 2010
The Pfam protein families database journal November 2011
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons journal December 2001
Twilight zone of protein sequence alignments journal February 1999
Database resources of the National Center for Biotechnology Information journal May 2009
High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource journal June 2014
The history of the genetic sequence databases journal April 1990
RefSeq microbial genomes database: new representation and annotation strategy journal December 2013
Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial "pan-genome" journal September 2005
PATRIC, the bacterial bioinformatics database and analysis resource journal November 2013
RefSeq microbial genomes database: new representation and annotation strategy journal March 2015
Fully automatic 3D segmentation of the thoracolumbar spinal cord and the vertebral canal from T2-weighted MRI using K-means clustering algorithm journal March 2020
Coriander Genomics Database: a genomic, transcriptomic, and metabolic database for coriander journal April 2020
Spatiotemporal persistence of multiple, diverse clades and toxins of Corynebacterium diphtheriae journal March 2021
Database resources of the National Center for Biotechnology Information journal January 2006
Database resources of the National Center for Biotechnology Information journal October 2020
Database resources of the National Center for Biotechnology Information journal December 2007
Database resources of the National Center for Biotechnology Information journal November 2018
Database resources of the National Center for Biotechnology Information journal January 2009
Comparative Phylogenomics and Evolution of the Brucellae Reveal a Path to Virulence journal December 2013
Genome Sequences of the Primary Endosymbiont “Candidatus Portiera aleyrodidarum” in the Whitefly Bemisia tabaci B and Q Biotypes journal November 2012
Databases of homologous gene families for comparative genomics journal January 2009

Cited By (17)

Assembly, Annotation, and Comparative Genomics in PATRIC, the All Bacterial Bioinformatics Resource Center book December 2017
Putative virulence factors of Plesiomonas shigelloides journal August 2019
PATRIC as a unique resource for studying antimicrobial resistance journal July 2017
Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center journal November 2016
The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities journal October 2019
SwiftOrtho: a Fast, Memory-Efficient, Multiple Genome Orthology Classifier journal February 2019
Comparative Genomics Reveals a Well-Conserved Intrinsic Resistome in the Emerging Multidrug-Resistant Pathogen Cupriavidus gilardii journal October 2019
Identification and molecular epidemiology of methicillin resistant Staphylococcus pseudintermedius strains isolated from canine clinical samples in Argentina journal July 2019
Trait-based analysis of the human skin microbiome journal July 2019
First genome sequencing and comparative analyses of Corynebacterium pseudotuberculosis strains from Mexico journal October 2018
Whole Genome Sequencing and Comparative Genomics of Two Nematicidal Bacillus Strains Reveals a Wide Range of Possible Virulence Factors journal January 2020
Brucella spp. of amphibians comprise genomically diverse motile strains competent for replication in macrophages and survival in mammalian hosts text January 2017
A Diverse Repertoire of Exopolysaccharide Biosynthesis Gene Clusters in Lactobacillus Revealed by Comparative Analysis in 106 Sequenced Genomes journal October 2019
Brucella spp. of amphibians comprise genomically diverse motile strains competent for replication in macrophages and survival in mammalian hosts journal March 2017
SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier journal October 2019
Draft Genome Sequence of a Chryseobacterium indologenes Strain Isolated from a Blood Culture of a Hospitalized Child in Antananarivo, Madagascar
  • Rabenandrasana, Mamitina Alain Noah; Rafetrarivony, Lala Fanomezantsoa; Rivoarilala, Lalainasoa Odile
  • Microbiology Resource Announcements, Vol. 8, Issue 35 https://doi.org/10.1128/mra.00752-19
journal August 2019
Corrigendum journal September 2020

Similar Records

PATRIC, the bacterial bioinformatics database and analysis resource
Journal Article · Tue Nov 12 00:00:00 EST 2013 · Nucleic Acids Research · OSTI ID:1248167

Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center
Journal Article · Mon Nov 28 00:00:00 EST 2016 · Nucleic Acids Research · OSTI ID:1248167

The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities
Journal Article · Thu Oct 31 00:00:00 EDT 2019 · Nucleic Acids Research · OSTI ID:1248167