skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An editing environment for DNA sequence analysis and annotation

Abstract

This paper presents a computer system for analyzing and annotating large-scale genomic sequences. The core of the system is a multiple-gene structure identification program, which predicts the most probable gene structures based on the given evidence, including pattern recognition, EST and protein homology information. A graphics-based user interface provides an environment which allows the user to interactively control the evidence to be used in the gene identification process. To overcome the computational bottleneck in the database similarity search used in the gene identification process, the authors have developed an effective way to partition a database into a set of sub-databases of related sequences, and reduced the search problem on a large database to a signature identification problem and a search problem on a much smaller sub-database. This reduces the number of sequences to be searched from N to O({radical}N) on average, and hence greatly reduces the search time, where N is the number of sequences in the original database. The system provides the user with the ability to facilitate and modify the analysis and modeling in real time.

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Oak Ridge National Lab., TN (United States)
Sponsoring Org.:
USDOE Office of Energy Research, Washington, DC (United States)
OSTI Identifier:
563243
Report Number(s):
ORNL/CP-94756; CONF-980118-
ON: DE98000574; BR: KP1103010; TRN: AHC29803%%80
DOE Contract Number:
AC05-96OR22464
Resource Type:
Technical Report
Resource Relation:
Conference: 3. Pacific symposium on biocomputing, Kapalua, HI (United States), 5 Jan 1998; Other Information: PBD: [1998]
Country of Publication:
United States
Language:
English
Subject:
55 BIOLOGY AND MEDICINE, BASIC STUDIES; 99 MATHEMATICS, COMPUTERS, INFORMATION SCIENCE, MANAGEMENT, LAW, MISCELLANEOUS; DNA SEQUENCING; INFORMATION SYSTEMS; MOLECULAR BIOLOGY; EXPERIMENTAL DATA; GENES; DNA; MOLECULAR STRUCTURE

Citation Formats

Uberbacher, E.C., Xu, Y., Shah, M.B., Olman, V., Parang, M., and Mural, R. An editing environment for DNA sequence analysis and annotation. United States: N. p., 1998. Web. doi:10.2172/563243.
Uberbacher, E.C., Xu, Y., Shah, M.B., Olman, V., Parang, M., & Mural, R. An editing environment for DNA sequence analysis and annotation. United States. doi:10.2172/563243.
Uberbacher, E.C., Xu, Y., Shah, M.B., Olman, V., Parang, M., and Mural, R. Thu . "An editing environment for DNA sequence analysis and annotation". United States. doi:10.2172/563243. https://www.osti.gov/servlets/purl/563243.
@article{osti_563243,
title = {An editing environment for DNA sequence analysis and annotation},
author = {Uberbacher, E.C. and Xu, Y. and Shah, M.B. and Olman, V. and Parang, M. and Mural, R.},
abstractNote = {This paper presents a computer system for analyzing and annotating large-scale genomic sequences. The core of the system is a multiple-gene structure identification program, which predicts the most probable gene structures based on the given evidence, including pattern recognition, EST and protein homology information. A graphics-based user interface provides an environment which allows the user to interactively control the evidence to be used in the gene identification process. To overcome the computational bottleneck in the database similarity search used in the gene identification process, the authors have developed an effective way to partition a database into a set of sub-databases of related sequences, and reduced the search problem on a large database to a signature identification problem and a search problem on a much smaller sub-database. This reduces the number of sequences to be searched from N to O({radical}N) on average, and hence greatly reduces the search time, where N is the number of sequences in the original database. The system provides the user with the ability to facilitate and modify the analysis and modeling in real time.},
doi = {10.2172/563243},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Dec 31 00:00:00 EST 1998},
month = {Thu Dec 31 00:00:00 EST 1998}
}

Technical Report:

Save / Share:
  • The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternativemore » splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.« less
  • The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternativemore » splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.« less
  • The overall goal of this project was to elucidate the structure/function relationships between oxidized DNA bases and the DNA repair enzymes that recognize and remove them. The NMR solution structure of formamidopyrimidine DNA glycosylase (Fpg) that recognizes oxidized DNA purines was to be determined. Furthermore, the solution structures of DNA molecules containing specific lesions recognized by Fpg was to be determined in sequence contexts that either facilitate or hinder this recognition. These objectives were in keeping with the long-term goals of the Principal Investigator's laboratory, that is, to understand the basic mechanisms that underpin base excision repair processing of oxidativemore » DNA lesions and to elucidate the interactions of unrepaired lesions with DNA polymerases. The results of these two DNA transactions can ultimately determine the fate of the cell. These objectives were also in keeping with the goals of our collaborator, Dr. Michael Kennedy, who is studying the repair and recognition of damaged DNA. Overall the goals of this project were congruent with those of the Department of Energy's Health Effects and Life Sciences Research Program, especially to the Structural Biology, the Human Genome and the Health Effects Programs. The mission of the latter Program includes understanding the biological effects and consequences of DNA damages produced by toxic agents in the many DOE waste sites so that cleanup can be accomplished in a safe, effective and timely manner.« less
  • Adaptive resonance theory (ART) describes a class of artificial neural network architectures that act as classification tools which self-organize, work in real-time, and require no retraining to classify novel sequences. We have adapted ART networks to provide support to scientists attempting to categorize tandem repeat DNA fragments from Onchocerca volvulus. In this approach, sequences of DNA fragments are presented to multiple ART-based networks which are linked together into two (or more) tiers; the first provides coarse sequence classification while the sub- sequent tiers refine the classifications as needed. The overall rating of the resulting classification of fragments is measured usingmore » statistical techniques based on those introduced to validate results from traditional phylogenetic analysis. Tests of the Hierarchical ART-based Classification Network, or HABclass network, indicate its value as a fast, easy-to-use classification tool which adapts to new data without retraining on previously classified data.« less