An editing environment for DNA sequence analysis and annotation
Abstract
This paper presents a computer system for analyzing and annotating large-scale genomic sequences. The core of the system is a multiple-gene structure identification program, which predicts the most probable gene structures based on the given evidence, including pattern recognition, EST and protein homology information. A graphics-based user interface provides an environment which allows the user to interactively control the evidence to be used in the gene identification process. To overcome the computational bottleneck in the database similarity search used in the gene identification process, the authors have developed an effective way to partition a database into a set of sub-databases of related sequences, and reduced the search problem on a large database to a signature identification problem and a search problem on a much smaller sub-database. This reduces the number of sequences to be searched from N to O({radical}N) on average, and hence greatly reduces the search time, where N is the number of sequences in the original database. The system provides the user with the ability to facilitate and modify the analysis and modeling in real time.
- Authors:
- Publication Date:
- Research Org.:
- Oak Ridge National Lab., TN (United States)
- Sponsoring Org.:
- USDOE Office of Energy Research, Washington, DC (United States)
- OSTI Identifier:
- 563243
- Report Number(s):
- ORNL/CP-94756; CONF-980118-
ON: DE98000574; BR: KP1103010; TRN: AHC29803%%80
- DOE Contract Number:
- AC05-96OR22464
- Resource Type:
- Technical Report
- Resource Relation:
- Conference: 3. Pacific symposium on biocomputing, Kapalua, HI (United States), 5 Jan 1998; Other Information: PBD: [1998]
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 55 BIOLOGY AND MEDICINE, BASIC STUDIES; 99 MATHEMATICS, COMPUTERS, INFORMATION SCIENCE, MANAGEMENT, LAW, MISCELLANEOUS; DNA SEQUENCING; INFORMATION SYSTEMS; MOLECULAR BIOLOGY; EXPERIMENTAL DATA; GENES; DNA; MOLECULAR STRUCTURE
Citation Formats
Uberbacher, E.C., Xu, Y., Shah, M.B., Olman, V., Parang, M., and Mural, R. An editing environment for DNA sequence analysis and annotation. United States: N. p., 1998.
Web. doi:10.2172/563243.
Uberbacher, E.C., Xu, Y., Shah, M.B., Olman, V., Parang, M., & Mural, R. An editing environment for DNA sequence analysis and annotation. United States. doi:10.2172/563243.
Uberbacher, E.C., Xu, Y., Shah, M.B., Olman, V., Parang, M., and Mural, R. Thu .
"An editing environment for DNA sequence analysis and annotation". United States.
doi:10.2172/563243. https://www.osti.gov/servlets/purl/563243.
@article{osti_563243,
title = {An editing environment for DNA sequence analysis and annotation},
author = {Uberbacher, E.C. and Xu, Y. and Shah, M.B. and Olman, V. and Parang, M. and Mural, R.},
abstractNote = {This paper presents a computer system for analyzing and annotating large-scale genomic sequences. The core of the system is a multiple-gene structure identification program, which predicts the most probable gene structures based on the given evidence, including pattern recognition, EST and protein homology information. A graphics-based user interface provides an environment which allows the user to interactively control the evidence to be used in the gene identification process. To overcome the computational bottleneck in the database similarity search used in the gene identification process, the authors have developed an effective way to partition a database into a set of sub-databases of related sequences, and reduced the search problem on a large database to a signature identification problem and a search problem on a much smaller sub-database. This reduces the number of sequences to be searched from N to O({radical}N) on average, and hence greatly reduces the search time, where N is the number of sequences in the original database. The system provides the user with the ability to facilitate and modify the analysis and modeling in real time.},
doi = {10.2172/563243},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Dec 31 00:00:00 EST 1998},
month = {Thu Dec 31 00:00:00 EST 1998}
}
-
The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternativemore »
-
Analysis and Annotation of Nucleic Acid Sequence
The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternativemore » -
Structure/Function Analysis of DNA-glycosylases That Repair Oxidized Purines and Pyrimidines and the Influence of Surrounding DNA Sequence on Their Interactions
The overall goal of this project was to elucidate the structure/function relationships between oxidized DNA bases and the DNA repair enzymes that recognize and remove them. The NMR solution structure of formamidopyrimidine DNA glycosylase (Fpg) that recognizes oxidized DNA purines was to be determined. Furthermore, the solution structures of DNA molecules containing specific lesions recognized by Fpg was to be determined in sequence contexts that either facilitate or hinder this recognition. These objectives were in keeping with the long-term goals of the Principal Investigator's laboratory, that is, to understand the basic mechanisms that underpin base excision repair processing of oxidativemore » -
DNA sequence analysis using hierarchical ART-based classification networks
Adaptive resonance theory (ART) describes a class of artificial neural network architectures that act as classification tools which self-organize, work in real-time, and require no retraining to classify novel sequences. We have adapted ART networks to provide support to scientists attempting to categorize tandem repeat DNA fragments from Onchocerca volvulus. In this approach, sequences of DNA fragments are presented to multiple ART-based networks which are linked together into two (or more) tiers; the first provides coarse sequence classification while the sub- sequent tiers refine the classifications as needed. The overall rating of the resulting classification of fragments is measured usingmore »