skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets

Journal Article · · Molecular Biology and Evolution (Online)

The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available from: https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 and can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft.

Research Organization:
University of North Carolina, Greensboro, NC (United States)
Sponsoring Organization:
USDOE; National Science Foundation (NSF); National Institute of General Medical Sciences (NIGMS)
Grant/Contract Number:
DEB-1831730; R01GM132137; DEB-11930776
OSTI ID:
1816371
Journal Information:
Molecular Biology and Evolution (Online), Vol. 38, Issue 2; ISSN 1537-1719
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (28)

Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods journal January 2009
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
Biological Species Are Universal across Life’s Domains journal March 2017
Factors driving effective population size and pan-genome evolution in bacteria journal October 2018
Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes journal April 2007
GET_HOMOLOGUES, a Versatile Software Package for Scalable and Robust Microbial Pangenome Analysis journal October 2013
SonicParanoid: fast, accurate and easy orthology inference journal July 2018
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
Search and clustering orders of magnitude faster than BLAST journal August 2010
Primary orthologs from local sequence context journal February 2020
eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences journal November 2015
COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations journal January 2006
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability journal January 2013
Computational methods for Gene Orthology inference journal June 2011
OrthoDB: the hierarchical catalog of eukaryotic orthologs journal December 2007
Accurate prediction of orthologs in the presence of divergence after duplication journal June 2018
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes journal September 2003
Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity journal February 2020
The microbial pan-genome journal December 2005
Ultra-fast sequence clustering from similarity networks with SiLiX journal April 2011
Roary: rapid large-scale prokaryote pan genome analysis journal July 2015
A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life journal August 2018
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons journal December 2001
Algorithm of OMA for large-scale orthology inference journal December 2008
Big data and other challenges in the quest for orthologs journal July 2014
A Genomic Perspective on Protein Families journal October 1997
Horizontal Transfer, Not Duplication, Drives the Expansion of Protein Families in Prokaryotes journal January 2011
Ten years of pan-genome analyses journal February 2015