skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Gene recognition and assembly in the GRAIL system: Progress and challenges

Conference ·
OSTI ID:10177796

GRAIL is a comprehensive system being constructed to analyze and characterize the genetic structure of DNA sequences. A number of program modules supply information to the system including the Coding Recognition Module (CRM), which forms the basis of the current e-mail GRAIL server system. Additional modules determine the positions and scores of possible splice junctions, the positions of potential translation-initiation sites, the coding strand for each gene, and the probable-translation-frame function over the sequence. The Gene Assembly Program module (GAP) attempts to predict the sequence of the spliced mRNA for agene from the genomic DNA sequence. It constructs and scores I gene models, given a DNA sequence and the outputs of the other GRAIL modules for the sequence. GAP tests combinations of those splice junctions which are within acceptable distance from the initial predicted edges of the coding regions. Every complete gene model comprising translation-initiation site, splice junctions and stop codon, which agrees with GAP`s set of rules is scored, and the ten high-scoring models are saved. Each gene models score depends on the input scores of splice junctions used in the model, their positions relative to the initial predicted edges of the included coding regions, and the degree of agreement of the entire model with the probable-translation-frame function. If error conditions are detected, the present version of GAP attempts to correct them by the insertion and/or deletion of one or more coding regions. These insertions and deletions have resulted in a net improvement of gene models, and a particularly large improvement in the recognition and characterization of very short coding regions. The results of GRAIL including the GAP module for 26 sequences from GenBank, each with an experimentally characterized gene, are quite promising and demonstrate the feasibility of constructing largely accurate gene models strictly on the basis of DNA sequence data.

Research Organization:
Oak Ridge National Lab., TN (United States)
Sponsoring Organization:
USDOE, Washington, DC (United States)
DOE Contract Number:
AC05-84OR21400
OSTI ID:
10177796
Report Number(s):
CONF-9206273-1; ON: DE92040709
Resource Relation:
Conference: 2. international conference on bioinformatics, supercomputing, and complex genome analysis,St. Petersburg, FL (United States),4-7 Jun 1992; Other Information: PBD: [1992]
Country of Publication:
United States
Language:
English