skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames

Abstract

Discriminant analysis is applied to the problem of recognition 5`-, internal and 3`-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotide in protein coding and nation regions. The accuracy of our splice site recognition function is about 97%. A discriminant function for 5`-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF coding potential, donor splice site potential and composition of downstream introit region. For internal exon prediction, we combine in a discriminant function the characteristics describing the 5`- intron region, donor splice site, coding region, acceptor splice site and Y-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79% and a level of pseudoexon ORF prediction of 99.96%. The recognition quality computed at the level of individual nucleotides is 89%, for exon sequencesmore » and 98% for intron sequences. A discriminant function for 3`-exon prediction includes octanucleolide composition of upstream nation region, triplet composition around the stop codon, ORF coding potential, acceptor splice site potential and hexanucleotide composition of downstream region. We unite these three discriminant functions in exon predicting program FEX (find exons). FEX exactly predicts 70% of 1016 exons from the test of 181 complete genes with specificity 73%, and 89% exons are exactly or partially predicted. On the average, 85% of nucleotides were predicted accurately with specificity 91%.« less

Authors:
; ;  [1]
  1. Baylor College of Medicine, Houston, TX (United States)
Publication Date:
Research Org.:
Stanford Univ., CA (United States)
OSTI Identifier:
377163
Report Number(s):
CONF-9408117-
TRN: 96:005197-0043
Resource Type:
Technical Report
Resource Relation:
Conference: 2. international conference on intelligent systems for molecular biology, Stanford, CA (United States), 15-17 Aug 1994; Other Information: PBD: [1994]; Related Information: Is Part Of Proceedings: Second international conference on intelligent systems for molecular biology; Altman, R.; Brutlag, D.; Karp, P.; Lathrop, R.; Searls, D. [eds.]; PB: 389 p.
Country of Publication:
United States
Language:
English
Subject:
55 BIOLOGY AND MEDICINE, BASIC STUDIES; 99 MATHEMATICS, COMPUTERS, INFORMATION SCIENCE, MANAGEMENT, LAW, MISCELLANEOUS; GENES; DATA ANALYSIS; ACCURACY; DNA; NUCLEOTIDES; DNA SEQUENCING; FORECASTING; GENETIC MAPPING

Citation Formats

Solovyev, V V, Salamov, A A, and Lawrence, C B. The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. United States: N. p., 1994. Web.
Solovyev, V V, Salamov, A A, & Lawrence, C B. The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. United States.
Solovyev, V V, Salamov, A A, and Lawrence, C B. Sat . "The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames". United States.
@article{osti_377163,
title = {The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames},
author = {Solovyev, V V and Salamov, A A and Lawrence, C B},
abstractNote = {Discriminant analysis is applied to the problem of recognition 5`-, internal and 3`-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotide in protein coding and nation regions. The accuracy of our splice site recognition function is about 97%. A discriminant function for 5`-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF coding potential, donor splice site potential and composition of downstream introit region. For internal exon prediction, we combine in a discriminant function the characteristics describing the 5`- intron region, donor splice site, coding region, acceptor splice site and Y-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79% and a level of pseudoexon ORF prediction of 99.96%. The recognition quality computed at the level of individual nucleotides is 89%, for exon sequences and 98% for intron sequences. A discriminant function for 3`-exon prediction includes octanucleolide composition of upstream nation region, triplet composition around the stop codon, ORF coding potential, acceptor splice site potential and hexanucleotide composition of downstream region. We unite these three discriminant functions in exon predicting program FEX (find exons). FEX exactly predicts 70% of 1016 exons from the test of 181 complete genes with specificity 73%, and 89% exons are exactly or partially predicted. On the average, 85% of nucleotides were predicted accurately with specificity 91%.},
doi = {},
url = {https://www.osti.gov/biblio/377163}, journal = {},
number = ,
volume = ,
place = {United States},
year = {1994},
month = {12}
}

Technical Report:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that may hold this item. Keep in mind that many technical reports are not cataloged in WorldCat.

Save / Share: