skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach

Abstract

Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. The authors describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, the authors method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the coding recognition module identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which the authors are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.

Authors:
;  [1]
  1. Oak Ridge National Lab., TN (United States) Univ. of Tennessee, Oak Ridge (United States)
Publication Date:
OSTI Identifier:
5604872
DOE Contract Number:  
AC05-84OR21400
Resource Type:
Journal Article
Journal Name:
Proceedings of the National Academy of Sciences of the United States of America; (United States)
Additional Journal Information:
Journal Volume: 88:24; Journal ID: ISSN 0027-8424
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; GENES; MOLECULAR STRUCTURE; ALKALINE PHOSPHATASE; BIOLOGICAL MARKERS; DNA SEQUENCING; MAN; PATTERN RECOGNITION; PHOSPHORUS-GROUP TRANSFERASES; PHOSPHOTRANSFERASES; PROTEINS; PROTHROMBIN; ANIMALS; BLOOD COAGULATION FACTORS; COAGULANTS; DRUGS; ENZYMES; ESTERASES; HEMATOLOGIC AGENTS; HYDROLASES; MAMMALS; ORGANIC COMPOUNDS; PHOSPHATASES; PRIMATES; STRUCTURAL CHEMICAL ANALYSIS; TRANSFERASES; VERTEBRATES; 550200* - Biochemistry

Citation Formats

Uberbacher, E C, and Mural, R J. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. United States: N. p., 1991. Web. doi:10.1073/pnas.88.24.11261.
Uberbacher, E C, & Mural, R J. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. United States. doi:10.1073/pnas.88.24.11261.
Uberbacher, E C, and Mural, R J. Sun . "Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach". United States. doi:10.1073/pnas.88.24.11261.
@article{osti_5604872,
title = {Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach},
author = {Uberbacher, E C and Mural, R J},
abstractNote = {Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. The authors describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, the authors method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the coding recognition module identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which the authors are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.},
doi = {10.1073/pnas.88.24.11261},
journal = {Proceedings of the National Academy of Sciences of the United States of America; (United States)},
issn = {0027-8424},
number = ,
volume = 88:24,
place = {United States},
year = {1991},
month = {12}
}