DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Gene fusions and gene duplications: relevance to genomic annotation and functional analysis

Journal Article · · BMC Genomics
 [1];  [1]
  1. Marine Biological Laboratory, Woods Hole, MA (United States). Bay Paul Center for Comparative Molecular Biology and Evolution

Background: Escherichia coli a model organism provides information for annotation of other genomes. Our analysis of its genome has shown that proteins encoded by fused genes need special attention. Such composite (multimodular) proteins consist of two or more components (modules) encoding distinct functions. Multimodular proteins have been found to complicate both annotation and generation of sequence similar groups. Previous work overstated the number of multimodular proteins in E. coli. This work corrects the identification of modules by including sequence information from proteins in 50 sequenced microbial genomes. Results: Multimodular E. coli K-12 proteins were identified from sequence similarities between their component modules and non-fused proteins in 50 genomes and from the literature. We found 109 multimodular proteins in E. coli containing either two or three modules. Most modules had standalone sequence relatives in other genomes. The separated modules together with all the single (un-fused) proteins constitute the sum of all unimodular proteins of E. coli. Pairwise sequence relationships among all E. coli unimodular proteins generated 490 sequence similar, paralogous groups. Groups ranged in size from 92 to 2 members and had varying degrees of relatedness among their members. Some E. coli enzyme groups were compared to homologs in other bacterial genomes. Conclusion: The deleterious effects of multimodular proteins on annotation and on the formation of groups of paralogs are emphasized. To improve annotation results, all multimodular proteins in an organism should be detected and when known each function should be connected with its location in the sequence of the protein. When transferring functions by sequence similarity, alignment locations must be noted, particularly when alignments cover only part of the sequences, in order to enable transfer of the correct function. Separating multimodular proteins into module units makes it possible to generate protein groups related by both sequence and function, avoiding mixing of unrelated sequences. Organisms differ in sizes of groups of sequence-related proteins. A sample comparison of orthologs to selected E. coli paralogous groups correlates with known physiological and taxonomic relationships between the organisms.

Research Organization:
Marin Biology Lab., Woods Hole, MA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division; National Aeronautics and Space Administration (NASA)
Grant/Contract Number:
FG02-01ER63202; NCC2-1054
OSTI ID:
1626445
Journal Information:
BMC Genomics, Vol. 6, Issue 1; ISSN 1471-2164
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (56)

ASAP, a systematic annotation package for community analysis of genomes journal January 2003
The SUPERFAMILY database in 2004: additions and improvements journal January 2004
coliBASE: an online database for Escherichia coli, Shigella and Salmonella comparative genomics journal January 2004
Identification of common molecular subsequences journal March 1981
Evolution of a Biosynthetic Pathway: The Tryptophan Paradigm journal October 1989
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure journal November 2001
Divergent Evolution of Enzymatic Function: Mechanistically Diverse Superfamilies and Functionally Distinct Suprafamilies journal June 2001
Physiological genomics of Escherichia coli protein families journal April 2002
A general method applicable to the search for similarities in the amino acid sequence of two proteins journal March 1970
Divergence of Function in Sequence-Related Groups of Escherichia coli Proteins journal August 2001
Parallel Evolution of Ligand Specificity Between LacI/GalR Family Repressors and Periplasmic Sugar-Binding Proteins journal February 2003
Completing the E. coli proteome: a database of gene products characterised since the completion of the genome sequence journal October 1999
Protein evolution viewed through Escherichia coli Protein sequences: Introducing the notion of a structural segment of homology, the module journal May 1997
The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli journal January 2004
Evaluation Measures of Multiple Sequence Alignments journal February 2000
Darwin v. 2.0: an interpreted computer language for the biosciences journal February 2000
RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12 journal January 2004
The Pfam protein families database journal January 2004
Evolutionary relationships between sugar kinases and transcriptional repressors in bacteria journal September 1994
GenProtEC: an updated and improved analysis of functions of Escherichia coli K-12 proteins journal January 2004
Modeling the percolation of annotation errors in a database of protein sequences journal December 2002
The two opposing activities of adenylyl transferase reside in distinct homologous domains, with intramolecular signal transduction journal September 1997
Sporulation Genes and Intercompartmental Regulation book April 2014
Proteome chips and protein function journal September 2001
Structure, function and evolution of multidomain proteins journal April 2004
The EcoCyc Database journal January 2002
A comparative genomics approach for studying ancestral proteins and evolution book January 2001
The large subunit of the fatty acid oxidation complex from Escherichia coli is a multifunctional polypeptide. Evidence for the existence of a fatty acid oxidation operon (fad AB) in Escherichia coli. journal August 1983
Proteolysis of the bifunctional methionine-repressible aspartokinase II-homoserine dehydrogenase II of Escherichia coli K12. Production of an active homoserine dehydrogenase fragment. journal November 1977
Protein evolution viewed through Escherichia coli Protein sequences: Introducing the notion of a structural segment of homology, the module journal May 1997
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure journal November 2001
Fine structure analysis of the threonine operon in Escherichia coli K-12 journal January 1978
A general method applicable to the search for similarities in the amino acid sequence of two proteins journal March 1970
Identification of common molecular subsequences journal March 1981
Sequence and evolution of the FruR protein of Salmonella typhimurium: a pleiotropic transcriptional regulatory protein possessing both activator and repressor functions which is homologous to the periplasmic ribose-binding protein journal January 1991
Structure, function and evolution of multidomain proteins journal April 2004
ABC transporters: physiology, structure and mechanism – an overview journal April 2001
The bacA Gene of Escherichia coli Encodes an Undecaprenyl Pyrophosphate Phosphatase Activity journal July 2004
Evaluation Measures of Multiple Sequence Alignments journal February 2000
Completing the E. coli proteome: a database of gene products characterised since the completion of the genome sequence journal October 1999
Modeling the percolation of annotation errors in a database of protein sequences journal December 2002
The two opposing activities of adenylyl transferase reside in distinct homologous domains, with intramolecular signal transduction journal September 1997
Parallel Evolution of Ligand Specificity Between LacI/GalR Family Repressors and Periplasmic Sugar-Binding Proteins journal February 2003
The EcoCyc Database journal January 2002
ASAP, a systematic annotation package for community analysis of genomes journal January 2003
GenProtEC: an updated and improved analysis of functions of Escherichia coli K-12 proteins journal January 2004
The SUPERFAMILY database in 2004: additions and improvements journal January 2004
The Pfam protein families database journal January 2004
Evolutionary relationships between sugar kinases and transcriptional repressors in bacteria journal September 1994
Location and properties of glucose dehydrogenase in sporulating cells and spores of Bacillus subtilis. journal January 1977
Genetic separability of the chorismate mutase and prephenate dehydrogenase components of the Escherichia coli tyrA gene product. journal January 1987
Widespread protein sequence similarities: origins of Escherichia coli genes journal March 1995
Divergent Evolution of Enzymatic Function: Mechanistically Diverse Superfamilies and Functionally Distinct Suprafamilies journal June 2001
Evolution of a Biosynthetic Pathway: The Tryptophan Paradigm journal October 1989
Physiological genomics of Escherichia coli protein families journal April 2002
Darwin v. 2.0: an interpreted computer language for the biosciences text January 2000

Cited By (5)

The Transporter Classification Database: recent advances journal January 2009
Escherichia coli K-12: a cooperatively developed annotation snapshot--2005 journal January 2006
The Transporter Classification Database: recent advances journal January 2009
Genomics of an extreme psychrophile, Psychromonas ingrahamii journal January 2008
Evolution by leaps: gene duplication in bacteria journal January 2009