skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Application of neural networks and other machine learning algorithms to DNA sequence analysis

Abstract

In this article we report initial, quantitative results on application of simple neutral networks, and simple machine learning methods, to two problems in DNA sequence analysis. The two problems we consider are: (1) determination of whether procaryotic and eucaryotic DNA sequences segments are translated to protein. An accuracy of 99.4% is reported for procaryotic DNA (E. coli) and 98.4% for eucaryotic DNA (H. Sapiens genes known to be expressed in liver); (2) determination of whether eucaryotic DNA sequence segments containing the dinucleotides ''AG'' or ''GT'' are transcribed to RNA splice junctions. Accuracy of 91.2% was achieved on intron/exon splice junctions (acceptor sites) and 92.8% on exon/intron splice junctions (donor sites). The solution of these two problems, by use of information processing algorithms operating on unannotated base sequences and without recourse to biological laboratory work, is relevant to the Human Genome Project. A variety of neural network, machine learning, and information theoretic algorithms are used. The accuracies obtained exceed those of previous investigations for which quantitative results are available in the literature. They result from an ongoing program of research that applies machine learning algorithms to the problem of determining biological function of DNA sequences. Some predictions of possible new genesmore » using these methods are listed -- although a complete survey of the H. sapiens and E. coli sections of GenBank will be given elsewhere. 36 refs., 6 figs., 6 tabs.« less

Authors:
; ; ; ;
Publication Date:
Research Org.:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
OSTI Identifier:
6246699
Report Number(s):
LA-UR-89-1788; CONF-881276-2
ON: DE89013457
DOE Contract Number:  
W-7405-ENG-36
Resource Type:
Conference
Resource Relation:
Conference: Interface between computational science and nucleic acid sequencing, Santa Fe, NM, USA, 12 Dec 1988; Other Information: Portions of this document are illegible in microfiche products
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ESCHERICHIA COLI; DNA SEQUENCING; INFORMATION THEORY; OPTIMIZATION; MAN; ALGORITHMS; DESIGN; USES; ANIMALS; BACTERIA; MAMMALS; MATHEMATICAL LOGIC; MICROORGANISMS; PRIMATES; STRUCTURAL CHEMICAL ANALYSIS; VERTEBRATES; 550200* - Biochemistry; 990300 - Information Handling

Citation Formats

Lapedes, A, Barnes, C, Burks, C, Farber, R, and Sirotkin, K. Application of neural networks and other machine learning algorithms to DNA sequence analysis. United States: N. p., 1988. Web. doi:10.4324/9780429501463-15.
Lapedes, A, Barnes, C, Burks, C, Farber, R, & Sirotkin, K. Application of neural networks and other machine learning algorithms to DNA sequence analysis. United States. https://doi.org/10.4324/9780429501463-15
Lapedes, A, Barnes, C, Burks, C, Farber, R, and Sirotkin, K. 1988. "Application of neural networks and other machine learning algorithms to DNA sequence analysis". United States. https://doi.org/10.4324/9780429501463-15. https://www.osti.gov/servlets/purl/6246699.
@article{osti_6246699,
title = {Application of neural networks and other machine learning algorithms to DNA sequence analysis},
author = {Lapedes, A and Barnes, C and Burks, C and Farber, R and Sirotkin, K},
abstractNote = {In this article we report initial, quantitative results on application of simple neutral networks, and simple machine learning methods, to two problems in DNA sequence analysis. The two problems we consider are: (1) determination of whether procaryotic and eucaryotic DNA sequences segments are translated to protein. An accuracy of 99.4% is reported for procaryotic DNA (E. coli) and 98.4% for eucaryotic DNA (H. Sapiens genes known to be expressed in liver); (2) determination of whether eucaryotic DNA sequence segments containing the dinucleotides ''AG'' or ''GT'' are transcribed to RNA splice junctions. Accuracy of 91.2% was achieved on intron/exon splice junctions (acceptor sites) and 92.8% on exon/intron splice junctions (donor sites). The solution of these two problems, by use of information processing algorithms operating on unannotated base sequences and without recourse to biological laboratory work, is relevant to the Human Genome Project. A variety of neural network, machine learning, and information theoretic algorithms are used. The accuracies obtained exceed those of previous investigations for which quantitative results are available in the literature. They result from an ongoing program of research that applies machine learning algorithms to the problem of determining biological function of DNA sequences. Some predictions of possible new genes using these methods are listed -- although a complete survey of the H. sapiens and E. coli sections of GenBank will be given elsewhere. 36 refs., 6 figs., 6 tabs.},
doi = {10.4324/9780429501463-15},
url = {https://www.osti.gov/biblio/6246699}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Fri Jan 01 00:00:00 EST 1988},
month = {Fri Jan 01 00:00:00 EST 1988}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: