skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Systematic identification and analysis of frequent gene fusion events in metabolic pathways

Abstract

Here, gene fusions are the most powerful type of in silico-derived functional associations. However, many fusion compilations were made when <100 genomes were available, and algorithms for identifying fusions need updating to handle the current avalanche of sequenced genomes. The availability of a large fusion dataset would help probe functional associations and enable systematic analysis of where and why fusion events occur. As a result, here we present a systematic analysis of fusions in prokaryotes. We manually generated two training sets: (i) 121 fusions in the model organism Escherichia coli; (ii) 131 fusions found in B vitamin metabolism. These sets were used to develop a fusion prediction algorithm that captured the training set fusions with only 7 % false negatives and 50 % false positives, a substantial improvement over existing approaches. This algorithm was then applied to identify 3.8 million potential fusions across 11,473 genomes. The results of the analysis are available in a searchable database. A functional analysis identified 3,000 reactions associated with frequent fusion events and revealed areas of metabolism where fusions are particularly prevalent. In conclusion, customary definitions of fusions were shown to be ambiguous, and a stricter one was proposed. Exploring the genes participating in fusionmore » events showed that they most commonly encode transporters, regulators, and metabolic enzymes. The major rationales for fusions between metabolic genes appear to be overcoming pathway bottlenecks, avoiding toxicity, controlling competing pathways, and facilitating expression and assembly of protein complexes. Finally, our fusion dataset provides powerful clues to decipher the biological activities of domains of unknown function.« less

Authors:
 [1];  [2];  [3];  [4];  [4];  [2];  [2];  [2];  [2];  [2];  [2];  [4];  [2];  [2]
  1. Argonne National Lab. (ANL), Argonne, IL (United States); The Univ. of Chicago, Chicago, IL (United States)
  2. Univ. of Florida, Gainesville, FL (United States)
  3. Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Florida, Gainesville, FL (United States)
  4. Argonne National Lab. (ANL), Argonne, IL (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
National Science Foundation (NSF); USDOE Office of Science (SC)
OSTI Identifier:
1392643
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
BMC Genomics
Additional Journal Information:
Journal Volume: 17; Journal Issue: 1; Journal ID: ISSN 1471-2164
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 60 APPLIED LIFE SCIENCES; Gene fusions; Escherichia coli; B vitamin pathways; Metabolic modeling; Essential reactions; Bottlenecks

Citation Formats

Henry, Christopher S., Lerma-Ortiz, Claudia, Gerdes, Svetlana Y., Mullen, Jeffrey D., Colasanti, Ric, Zhukov, Aleksey, Frelin, Oceane, Thiaville, Jennifer J., Zallot, Remi, Niehaus, Thomas D., Hasnain, Ghulam, Conrad, Neal, Hanson, Andrew D., and de Crecy-Lagard, Valerie. Systematic identification and analysis of frequent gene fusion events in metabolic pathways. United States: N. p., 2016. Web. doi:10.1186/s12864-016-2782-3.
Henry, Christopher S., Lerma-Ortiz, Claudia, Gerdes, Svetlana Y., Mullen, Jeffrey D., Colasanti, Ric, Zhukov, Aleksey, Frelin, Oceane, Thiaville, Jennifer J., Zallot, Remi, Niehaus, Thomas D., Hasnain, Ghulam, Conrad, Neal, Hanson, Andrew D., & de Crecy-Lagard, Valerie. Systematic identification and analysis of frequent gene fusion events in metabolic pathways. United States. doi:10.1186/s12864-016-2782-3.
Henry, Christopher S., Lerma-Ortiz, Claudia, Gerdes, Svetlana Y., Mullen, Jeffrey D., Colasanti, Ric, Zhukov, Aleksey, Frelin, Oceane, Thiaville, Jennifer J., Zallot, Remi, Niehaus, Thomas D., Hasnain, Ghulam, Conrad, Neal, Hanson, Andrew D., and de Crecy-Lagard, Valerie. Fri . "Systematic identification and analysis of frequent gene fusion events in metabolic pathways". United States. doi:10.1186/s12864-016-2782-3. https://www.osti.gov/servlets/purl/1392643.
@article{osti_1392643,
title = {Systematic identification and analysis of frequent gene fusion events in metabolic pathways},
author = {Henry, Christopher S. and Lerma-Ortiz, Claudia and Gerdes, Svetlana Y. and Mullen, Jeffrey D. and Colasanti, Ric and Zhukov, Aleksey and Frelin, Oceane and Thiaville, Jennifer J. and Zallot, Remi and Niehaus, Thomas D. and Hasnain, Ghulam and Conrad, Neal and Hanson, Andrew D. and de Crecy-Lagard, Valerie},
abstractNote = {Here, gene fusions are the most powerful type of in silico-derived functional associations. However, many fusion compilations were made when <100 genomes were available, and algorithms for identifying fusions need updating to handle the current avalanche of sequenced genomes. The availability of a large fusion dataset would help probe functional associations and enable systematic analysis of where and why fusion events occur. As a result, here we present a systematic analysis of fusions in prokaryotes. We manually generated two training sets: (i) 121 fusions in the model organism Escherichia coli; (ii) 131 fusions found in B vitamin metabolism. These sets were used to develop a fusion prediction algorithm that captured the training set fusions with only 7 % false negatives and 50 % false positives, a substantial improvement over existing approaches. This algorithm was then applied to identify 3.8 million potential fusions across 11,473 genomes. The results of the analysis are available in a searchable database. A functional analysis identified 3,000 reactions associated with frequent fusion events and revealed areas of metabolism where fusions are particularly prevalent. In conclusion, customary definitions of fusions were shown to be ambiguous, and a stricter one was proposed. Exploring the genes participating in fusion events showed that they most commonly encode transporters, regulators, and metabolic enzymes. The major rationales for fusions between metabolic genes appear to be overcoming pathway bottlenecks, avoiding toxicity, controlling competing pathways, and facilitating expression and assembly of protein complexes. Finally, our fusion dataset provides powerful clues to decipher the biological activities of domains of unknown function.},
doi = {10.1186/s12864-016-2782-3},
journal = {BMC Genomics},
number = 1,
volume = 17,
place = {United States},
year = {Fri Jun 24 00:00:00 EDT 2016},
month = {Fri Jun 24 00:00:00 EDT 2016}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Search and clustering orders of magnitude faster than BLAST
journal, August 2010


Regulatory Network of Escherichia coli: Consistency Between Literature Knowledge and Microarray Profiles
journal, November 2003

  • Gutierrez-Rios, R. M.
  • Genome Research, Vol. 13, Issue 11, p. 2435-2443
  • DOI: 10.1101/gr.1387003