skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames

Technical Report ·
OSTI ID:377163
; ;  [1]
  1. Baylor College of Medicine, Houston, TX (United States)

Discriminant analysis is applied to the problem of recognition 5`-, internal and 3`-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotide in protein coding and nation regions. The accuracy of our splice site recognition function is about 97%. A discriminant function for 5`-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF coding potential, donor splice site potential and composition of downstream introit region. For internal exon prediction, we combine in a discriminant function the characteristics describing the 5`- intron region, donor splice site, coding region, acceptor splice site and Y-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79% and a level of pseudoexon ORF prediction of 99.96%. The recognition quality computed at the level of individual nucleotides is 89%, for exon sequences and 98% for intron sequences. A discriminant function for 3`-exon prediction includes octanucleolide composition of upstream nation region, triplet composition around the stop codon, ORF coding potential, acceptor splice site potential and hexanucleotide composition of downstream region. We unite these three discriminant functions in exon predicting program FEX (find exons). FEX exactly predicts 70% of 1016 exons from the test of 181 complete genes with specificity 73%, and 89% exons are exactly or partially predicted. On the average, 85% of nucleotides were predicted accurately with specificity 91%.

Research Organization:
Stanford Univ., CA (United States)
OSTI ID:
377163
Report Number(s):
CONF-9408117-; TRN: 96:005197-0043
Resource Relation:
Conference: 2. international conference on intelligent systems for molecular biology, Stanford, CA (United States), 15-17 Aug 1994; Other Information: PBD: [1994]; Related Information: Is Part Of Proceedings: Second international conference on intelligent systems for molecular biology; Altman, R.; Brutlag, D.; Karp, P.; Lathrop, R.; Searls, D. [eds.]; PB: 389 p.
Country of Publication:
United States
Language:
English