skip to main content

Title: Combinatorial methods for gene recognition

The major result of the project is the development of a new approach to gene recognition called spliced alignment algorithm. They have developed an algorithm and implemented a software tool (for both IBM PC and UNIX platforms) which explores all possible exon assemblies in polynomial time and finds the multi-exon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully performs exons assemblies even in the case of short exons or exons with unusual codon usage; they also report correct assemblies for the genes with more than 10 exons provided a homologous protein is already known. On a test sample of human genes with known mammalian relatives the average overlap between the predicted and the actual genes was 99%, which is remarkably well as compared to other existing methods. At that, the algorithm absolute correctly reconstructed 87% of genes. The rare discrepancies between the predicted and real axon-intron structures were restricted either to extremely short initial or terminal exons or proved to be results of alternative splicing. Moreover, the algorithm performs reasonably well with non-vertebrate and even prokaryote targets. The spliced alignment software PROCRUSTES has been in extensive use by the academic community sincemore » its announcement in August, 1996 via the WWW server (www-hto.usc.edu/software/procrustes) and by biotech companies via the in-house UNIX version.« less
Authors:
Publication Date:
OSTI Identifier:
764709
DOE Contract Number:
FG02-94ER61919
Resource Type:
Technical Report
Resource Relation:
Other Information: PBD: 29 Oct 1997
Research Org:
Department of Computer Science, The Pennsylvania State Univ., University Park, PA 16802 (US)
Sponsoring Org:
USDOE Office of Energy Research (ER) (US)
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; ALGORITHMS; ALIGNMENT; CODONS; GENES; POLYNOMIALS; PROTEINS; SPLICING; TARGETS