skip to main content

DOE PAGESDOE PAGES

Title: PATtyFams: Protein families for the microbial genomes in the PATRIC database

The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.
Authors:
 [1] ;  [2] ;  [3] ;  [1] ;  [2] ;  [1] ;  [2] ;  [4] ;  [1]
  1. Univ. of Chicago, IL (United States); Argonne National Lab. (ANL), Argonne, IL (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States); Fellowship for Interpretation of Genomes, Burr Ridge, IL (United States)
  3. Univ. of Illinois at Urbana-Champaign, Urbana, IL (United States)
  4. Virginia Bioinformatics Institute, Virginia Tech University, Blacksburg,VA (United States)
Publication Date:
OSTI Identifier:
1248167
Grant/Contract Number:
AC02-06CH11357; NNA13AA91A
Type:
Accepted Manuscript
Journal Name:
Frontiers in Microbiology
Additional Journal Information:
Journal Volume: 7; Journal ID: ISSN 1664-302X
Publisher:
Frontiers Research Foundation
Research Org:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES genome annotation; comparative genomics; metabolic modeling; FIGfams; RAST