Application of neural networks and other machine learning algorithms to DNA sequence analysis
In this article we report initial, quantitative results on application of simple neutral networks, and simple machine learning methods, to two problems in DNA sequence analysis. The two problems we consider are: (1) determination of whether procaryotic and eucaryotic DNA sequences segments are translated to protein. An accuracy of 99.4% is reported for procaryotic DNA (E. coli) and 98.4% for eucaryotic DNA (H. Sapiens genes known to be expressed in liver); (2) determination of whether eucaryotic DNA sequence segments containing the dinucleotides ''AG'' or ''GT'' are transcribed to RNA splice junctions. Accuracy of 91.2% was achieved on intron/exon splice junctions (acceptor sites) and 92.8% on exon/intron splice junctions (donor sites). The solution of these two problems, by use of information processing algorithms operating on unannotated base sequences and without recourse to biological laboratory work, is relevant to the Human Genome Project. A variety of neural network, machine learning, and information theoretic algorithms are used. The accuracies obtained exceed those of previous investigations for which quantitative results are available in the literature. They result from an ongoing program of research that applies machine learning algorithms to the problem of determining biological function of DNA sequences. Some predictions of possible new genes using these methods are listed -- although a complete survey of the H. sapiens and E. coli sections of GenBank will be given elsewhere. 36 refs., 6 figs., 6 tabs.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- DOE Contract Number:
- W-7405-ENG-36
- OSTI ID:
- 6246699
- Report Number(s):
- LA-UR-89-1788; CONF-881276-2; ON: DE89013457
- Resource Relation:
- Conference: Interface between computational science and nucleic acid sequencing, Santa Fe, NM, USA, 12 Dec 1988; Other Information: Portions of this document are illegible in microfiche products
- Country of Publication:
- United States
- Language:
- English
Similar Records
Class I self-splicing introns are found in the T-even bacteriophage family
Trans splicing in Leishmania enriettii and identification of ribonucleoprotein complexes containing the spliced leader and U2 equivalent RNAs
Related Subjects
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
ESCHERICHIA COLI
DNA SEQUENCING
INFORMATION THEORY
OPTIMIZATION
MAN
ALGORITHMS
DESIGN
USES
ANIMALS
BACTERIA
MAMMALS
MATHEMATICAL LOGIC
MICROORGANISMS
PRIMATES
STRUCTURAL CHEMICAL ANALYSIS
VERTEBRATES
550200* - Biochemistry
990300 - Information Handling