skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Application of neural networks and other machine learning algorithms to DNA sequence analysis

Conference ·

In this article we report initial, quantitative results on application of simple neutral networks, and simple machine learning methods, to two problems in DNA sequence analysis. The two problems we consider are: (1) determination of whether procaryotic and eucaryotic DNA sequences segments are translated to protein. An accuracy of 99.4% is reported for procaryotic DNA (E. coli) and 98.4% for eucaryotic DNA (H. Sapiens genes known to be expressed in liver); (2) determination of whether eucaryotic DNA sequence segments containing the dinucleotides ''AG'' or ''GT'' are transcribed to RNA splice junctions. Accuracy of 91.2% was achieved on intron/exon splice junctions (acceptor sites) and 92.8% on exon/intron splice junctions (donor sites). The solution of these two problems, by use of information processing algorithms operating on unannotated base sequences and without recourse to biological laboratory work, is relevant to the Human Genome Project. A variety of neural network, machine learning, and information theoretic algorithms are used. The accuracies obtained exceed those of previous investigations for which quantitative results are available in the literature. They result from an ongoing program of research that applies machine learning algorithms to the problem of determining biological function of DNA sequences. Some predictions of possible new genes using these methods are listed -- although a complete survey of the H. sapiens and E. coli sections of GenBank will be given elsewhere. 36 refs., 6 figs., 6 tabs.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
DOE Contract Number:
W-7405-ENG-36
OSTI ID:
6246699
Report Number(s):
LA-UR-89-1788; CONF-881276-2; ON: DE89013457
Resource Relation:
Conference: Interface between computational science and nucleic acid sequencing, Santa Fe, NM, USA, 12 Dec 1988; Other Information: Portions of this document are illegible in microfiche products
Country of Publication:
United States
Language:
English