The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames
Technical Report
·
OSTI ID:377163
- Baylor College of Medicine, Houston, TX (United States)
Discriminant analysis is applied to the problem of recognition 5`-, internal and 3`-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotide in protein coding and nation regions. The accuracy of our splice site recognition function is about 97%. A discriminant function for 5`-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF coding potential, donor splice site potential and composition of downstream introit region. For internal exon prediction, we combine in a discriminant function the characteristics describing the 5`- intron region, donor splice site, coding region, acceptor splice site and Y-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79% and a level of pseudoexon ORF prediction of 99.96%. The recognition quality computed at the level of individual nucleotides is 89%, for exon sequences and 98% for intron sequences. A discriminant function for 3`-exon prediction includes octanucleolide composition of upstream nation region, triplet composition around the stop codon, ORF coding potential, acceptor splice site potential and hexanucleotide composition of downstream region. We unite these three discriminant functions in exon predicting program FEX (find exons). FEX exactly predicts 70% of 1016 exons from the test of 181 complete genes with specificity 73%, and 89% exons are exactly or partially predicted. On the average, 85% of nucleotides were predicted accurately with specificity 91%.
- Research Organization:
- Stanford Univ., CA (United States)
- OSTI ID:
- 377163
- Report Number(s):
- CONF-9408117--
- Country of Publication:
- United States
- Language:
- English
Similar Records
Structural organization of the human microsomal glutathione S-transferase gene (GST12)
Identification of human gene structure using linear discriminant functions and dynamic programming
The length of the downstream exon and the substitution of specific sequences affect pre-mRNA splicing in vitro
Journal Article
·
Thu Aug 15 00:00:00 EDT 1996
· Genomics
·
OSTI ID:484329
Identification of human gene structure using linear discriminant functions and dynamic programming
Technical Report
·
Sat Dec 30 23:00:00 EST 1995
·
OSTI ID:401866
The length of the downstream exon and the substitution of specific sequences affect pre-mRNA splicing in vitro
Journal Article
·
Sun Jan 31 23:00:00 EST 1988
· Mol. Cell. Biol.; (United States)
·
OSTI ID:5113683