Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Ortholog identification in the presence of domain architecture rearrangement

Journal Article · · Briefings in Bioinformatics
DOI:https://doi.org/10.1093/bib/bbr036· OSTI ID:1904524
 [1];  [2];  [2];  [2]
  1. University of California, Berkeley, CA (United States); Univ. of California, Berkeley, CA
  2. University of California, Berkeley, CA (United States)
Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology.While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area.
Research Organization:
University of California, Berkeley, CA (United States); University of California, Oakland, CA (United States)
Sponsoring Organization:
National Science Foundation (NSF); USDOE Office of Science (SC)
Grant/Contract Number:
SC0004916
OSTI ID:
1904524
Journal Information:
Briefings in Bioinformatics, Journal Name: Briefings in Bioinformatics Journal Issue: 5 Vol. 12; ISSN 1467-5463
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (74)

Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre
  • Bennett‐Lovsey, Riccardo M.; Herbert, Alex D.; Sternberg, Michael J. E.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 70, Issue 3 https://doi.org/10.1002/prot.21688
journal January 2008
The number of nucleotides required to determine the branching order of three species, with special reference to the human-chimpanzee-gorilla divergence journal December 1986
Domain-Based and Family-Specific Sequence Identity Thresholds Increase the Levels of Reliable Protein Function Transfer journal March 2009
The quest for orthologs: finding the corresponding gene across genomes journal November 2008
Orthology, paralogy and proposed classification for paralog subtypes journal December 2002
Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases journal November 1998
GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes journal May 2010
Phylogenomics and the reconstruction of the tree of life journal May 2005
Increased Taxon Sampling Greatly Reduces Phylogenetic Error journal July 2002
Increased Taxon Sampling Is Advantageous for Phylogenetic Inference journal July 2002
Taxon Sampling Affects Inferences of Macroevolutionary Processes from Phylogenetic Trees journal February 2008
ProDom: Automated clustering of homologous domains journal January 2002
Recent developments in the MAFFT multiple sequence alignment program journal March 2008
The what, where, how and why of gene ontology--a primer for bioinformaticians journal February 2011
Automated ortholog inference from phylogenetic trees and calculation of orthology reliability journal January 2002
Modeling the percolation of annotation errors in a database of protein sequences journal December 2002
SATCHMO: sequence alignment and tree construction using hidden Markov models journal July 2003
RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees journal December 2004
The predictive power of the CluSTr database journal June 2005
COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations journal January 2006
Active site prediction using evolutionary and structural information journal January 2010
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
ProtoNet 4.0: A hierarchical classification of one million protein sequences journal December 2004
The SYSTERS Protein Family Database in 2005 journal December 2004
The Universal Protein Resource (UniProt) journal December 2004
Inparanoid: a comprehensive database of eukaryotic orthologs journal December 2004
From genomics to chemical genomics: new developments in KEGG journal January 2006
PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways journal January 2007
Berkeley Phylogenomics Group web servers: resources for structural phylogenomic analysis journal May 2007
eggNOG: automated construction and annotation of orthologous groups of genes journal December 2007
The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies journal January 2009
Berkeley PHOG: PhyloFacts orthology group prediction web server journal May 2009
SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction journal April 2010
The neighbor-joining method: a new method for reconstructing phylogenetic trees. journal July 1987
Evolution of protein domain promiscuity in eukaryotes journal January 2008
Phylogenomics: Improving Functional Predictions for Uncharacterized Genes by Evolutionary Analysis journal March 1998
Comprehensive Analysis of Orthologous Protein Domains Using the HOPS Database journal October 2003
NOTUNG: a program for dating gene duplications and optimizing gene family trees. text January 2018
Orthology prediction at scalable resolution by phylogenetic tree analysis journal March 2007
FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function journal January 2007
Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution journal January 2007
Joining forces in the quest for orthologs journal January 2009
Functional Classification Using Phylogenomic Inference journal June 2006
Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies journal December 2009
Getting Started in Structural Phylogenomics journal January 2010
Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting journal October 2006
Automatic annotation of protein function based on family identification journal October 2003
The number of nucleotides required to determine the branching order of three species, with special reference to the human-chimpanzee-gorilla divergence journal December 1986
Orthology, paralogy and proposed classification for paralog subtypes journal December 2002
Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases journal November 1998
Gene Ontology: tool for the unification of biology journal May 2000
NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees journal August 2000
MRBAYES: Bayesian inference of phylogenetic trees journal August 2001
OrthoGUI: graphical presentation of Orthostrapper results journal September 2002
Phylogenomic inference of protein molecular function: advances and challenges journal January 2004
ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons journal January 2000
The Pfam protein families database journal January 2004
KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites journal July 2005
Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits journal June 2006
PhylomeDB: a database for genome-wide collections of gene phylogenies journal December 2007
Data growth and its impact on the SCOP database: new developments journal December 2007
A highly sensitive selection method for directed evolution of homing endonucleases journal October 2005
EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates journal December 2008
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes journal September 2003
The Sequence of the Human Genome journal February 2001
Protein Families and Their Evolution—A Structural Perspective journal June 2005
RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs journal May 2002
The COG database: an updated version includes eukaryotes journal January 2003
Estimating the annotation error rate of curated GO database sequence annotations journal May 2007
PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classification journal September 2006
Automated Protein Subfamily Identification and Classification journal August 2007
Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes journal April 2007
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments journal March 2010
Distinguishing Homologous from Analogous Proteins journal June 1970

Cited By (8)

PhyloPro2.0: a database for the dynamic exploration of phylogenetically conserved proteins and their domain architectures across the Eukarya journal January 2016
Evolution and Classification of Myosins, a Paneukaryotic Whole-Genome Approach journal January 2014
The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification journal May 2013
Uniclust databases of clustered and deeply annotated protein sequences and alignments journal November 2016
eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses journal November 2018
Automated selection of homologs to track the evolutionary history of proteins journal November 2018
orthoFind Facilitates the Discovery of Homologous and Orthologous Proteins journal December 2015
Inferring Orthologs: Open Questions and Perspectives journal January 2016

Similar Records

The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification
Journal Article · Fri May 17 20:00:00 EDT 2013 · Nucleic Acids Research · OSTI ID:1904526

AlgaeOrtho, a bioinformatics tool for processing ortholog inference results in algae
Journal Article · Sun Mar 02 19:00:00 EST 2025 · Frontiers in Microbiology · OSTI ID:2558964

Hemiptera phylogenomic resources: Tree‐based orthology prediction and conserved exon identification
Journal Article · Sun Jul 12 20:00:00 EDT 2020 · Molecular Ecology Resources · OSTI ID:1638292