Spliced alignment: A new approach to gene recognition
- Inst. of Protein Research, Puschino (Russian Federation)
- NIIGENETIKA, Moscow (Russian Federation)
- Univ. of California, Los Angeles, CA (United States)
Gene structure prediction is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics and artificial intelligence and, surprisingly enough, applications of theoretical computer science methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way towards a new combinatorial approach to gene recognition. This paper describes a spliced alignment algorithm and a software tool which explores all possible exon assemblies in polynomial time and finds the multi-exon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; the authors also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives the average correlation between the predicted and the actual genes was 99%, which is a very high accuracy as compared with other existing methods. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused by either (i) extremely short (less than 5 amino acids) initial or terminal exons, or (ii) alternative splicing, or (iii) errors in database feature tables. 38 refs., 3 tabs.
- OSTI ID:
- 495270
- Report Number(s):
- CONF-960679-; TRN: 97:000617-0002
- Resource Relation:
- Conference: 7. symposium on combinatorial pattern matching, Laguna Beach, CA (United States), 10-12 Jun 1996; Other Information: PBD: 1996; Related Information: Is Part Of Combinatorial pattern matching; Hirschberg, D.; Myers, G. [eds.]; PB: 393 p.
- Country of Publication:
- United States
- Language:
- English
Similar Records
Las Vegas algorithms for gene recognition: Suboptimal and error-tolerant spliced alignment
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames