Combinatorial methods for gene recognition
The major result of the project is the development of a new approach to gene recognition called spliced alignment algorithm. They have developed an algorithm and implemented a software tool (for both IBM PC and UNIX platforms) which explores all possible exon assemblies in polynomial time and finds the multi-exon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully performs exons assemblies even in the case of short exons or exons with unusual codon usage; they also report correct assemblies for the genes with more than 10 exons provided a homologous protein is already known. On a test sample of human genes with known mammalian relatives the average overlap between the predicted and the actual genes was 99%, which is remarkably well as compared to other existing methods. At that, the algorithm absolute correctly reconstructed 87% of genes. The rare discrepancies between the predicted and real axon-intron structures were restricted either to extremely short initial or terminal exons or proved to be results of alternative splicing. Moreover, the algorithm performs reasonably well with non-vertebrate and even prokaryote targets. The spliced alignment software PROCRUSTES has been in extensive use by the academic community since its announcement in August, 1996 via the WWW server (www-hto.usc.edu/software/procrustes) and by biotech companies via the in-house UNIX version.
- Research Organization:
- Department of Computer Science, The Pennsylvania State Univ., University Park, PA 16802 (US)
- Sponsoring Organization:
- USDOE Office of Energy Research (ER) (US)
- DOE Contract Number:
- FG02-94ER61919
- OSTI ID:
- 764709
- Resource Relation:
- Other Information: PBD: 29 Oct 1997
- Country of Publication:
- United States
- Language:
- English
Similar Records
Las Vegas algorithms for gene recognition: Suboptimal and error-tolerant spliced alignment
nGASP - the nematode genome annotation assessment project