skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Filtering "genic" open reading frames from genomic DNA samples for advanced annotation

Journal Article · · BMC Genomics
 [1];  [1];  [2];  [3];  [3];  [1];  [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Division
  2. Univ. of Milan (Italy). School of Pharmacy. Dept. of Structural Chemistry and Inorganic Stereochemistry
  3. Univ. of Eastern Piedmont, Novara (Italy). Dept. of Medical Sciences. IRCAD

Background: In order to carry out experimental gene annotation, DNA encoding open reading frames (ORFs) derived from real genes (termed “genic”) in the correct frame is required. When genes are correctly assigned, isolation of genic DNA for functional annotation can be carried out by PCR. However, not all genes are correctly assigned, and even when correctly assigned, gene products are often incorrectly folded when expressed in heterologous hosts. This is a problem that can sometimes be overcome by the expression of protein fragments encoding domains, rather than full-length proteins. One possible method to isolate DNA encoding such domains would to “filter” complex DNA (cDNA libraries, genomic and metagenomic DNA) for gene fragments that confer a selectable phenotype relying on correct folding, with all such domains present in a complex DNA sample, termed the “domainome”. Results: In this paper we discuss the preparation of diverse genic ORF libraries from randomly fragmented genomic DNA using ß-lactamase to filter out the open reading frames. By cloning DNA fragments between leader sequences and the mature ß-lactamase gene, colonies can be selected for resistance to ampicillin, conferred by correct folding of the lactamase gene. Our experiments demonstrate that the majority of surviving colonies contain genic open reading frames, suggesting that ß-lactamase is acting as a selectable folding reporter. Furthermore, different leaders (Sec, TAT and SRP), normally translocating different protein classes, filter different genic fragment subsets, indicating that their use increases the fraction of the “domainone” that is accessible. Conclusions: The availability of ORF libraries, obtained with the filtering method described here, combined with screening methods such as phage display and protein-protein interaction studies, or with protein structure determination projects, can lead to the identification and structural determination of functional genic ORFs. ORF libraries represent, moreover, a useful tool to proceed towards high-throughput functional annotation of newly sequenced genomes.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC52-06NA25396
OSTI ID:
1626412
Journal Information:
BMC Genomics, Vol. 12, Issue Suppl 1; ISSN 1471-2164
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (27)

Automated bacterial genome analysis and annotation journal October 2006
Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach journal January 2007
Rapid protein-folding assay using green fluorescent protein journal July 1999
Accuracy and quality of massively parallel DNA pyrosequencing journal January 2007
Identification and correction of abnormal, incomplete and mispredicted proteins in public databases journal January 2008
Exhaustive Enumeration of Protein Domain Families journal May 2003
Selecting Open Reading Frames From DNA journal May 2003
Protein production by auto-induction in high-density shaking cultures journal May 2005
Proteome Complexity Measures Based on Counting of Domain-to-Protein Links for Replicative and Non-Replicative Domains book January 2006
Protein analysis on a proteomic scale journal March 2003
Recombinant protein secretion in Escherichia coli journal May 2005
ORFeome Cloning and Systems Biology: Standardized Mass Production of the Parts From the Parts-List journal October 2004
A comprehensive analysis of filamentous phage display vectors for cytoplasmic proteins: an analysis with different fluorescent proteins journal December 2009
From cellulosomes to cellulosomics journal January 2008
Mechanism-Based Profiling of Enzyme Families journal October 2006
GMAP: a genomic mapping and alignment program for mRNA and EST sequences journal February 2005
A universal, vector-based system for nucleic acid reading-frame selection journal December 2002
Rapid interactome profiling by massive sequencing journal February 2010
An optimized growth medium for increased recombinant protein secretion titer via the type III secretion system journal February 2021
Mechanism-Based Profiling of Enzyme Families journal July 2006
Protein solubility and folding monitored in vivo by structural complementation of a genetic marker protein journal February 2001
Estimating the annotation error rate of curated GO database sequence annotations journal May 2007
A simple in vivo assay for increased protein solubility journal January 1999
More Than 1,001 Problems with Protein Domain Databases: Transmembrane Regions, Signal Peptides and the Issue of Sequence Homology journal July 2010
The complete genome of Bacillus subtilis : from sequence annotation to data management and analysis journal June 1998
A scaleable and integrated crystallization pipeline applied to mining the Thermotoga maritima proteome journal March 2004
Characterizing monoclonal antibody epitopes by filtered gene fragment phage display journal June 2005

Cited By (3)

Mining gut microbiome oligopeptides by functional metaproteome display journal October 2016
An Air-well sparging minifermenter system for high-throughput protein production journal September 2014
An efficient ORF selection system for DNA fragment libraries based on split beta-lactamase complementation journal July 2020