Spliced alignment: A new approach to gene recognition

Gelfand, M S; Mironov, A A; Pevzner, P A

Title: Spliced alignment: A new approach to gene recognition

Full Record
Other Related Research

Abstract

Gene structure prediction is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics and artificial intelligence and, surprisingly enough, applications of theoretical computer science methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way towards a new combinatorial approach to gene recognition. This paper describes a spliced alignment algorithm and a software tool which explores all possible exon assemblies in polynomial time and finds the multi-exon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; the authors also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives the average correlation between the predicted and the actual genes was 99%, which is a very high accuracy as compared with other existing methods. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused by either (i) extremely short (less than 5 amino acids) initial or terminal exons, or (ii) alternative splicing,more »« less

Authors:

Gelfand, M S ^[1]; Mironov, A A ^[2]; Pevzner, P A ^[3]

Inst. of Protein Research, Puschino (Russian Federation)
NIIGENETIKA, Moscow (Russian Federation)
Univ. of California, Los Angeles, CA (United States)

Publication Date:: Tue Dec 31 00:00:00 EST 1996

OSTI Identifier:: 495270

Report Number(s):: CONF-960679-
TRN: 97:000617-0002

Resource Type:: Conference

Resource Relation:: Conference: 7. symposium on combinatorial pattern matching, Laguna Beach, CA (United States), 10-12 Jun 1996; Other Information: PBD: 1996; Related Information: Is Part Of Combinatorial pattern matching; Hirschberg, D.; Myers, G. [eds.]; PB: 393 p.

Country of Publication:: United States

Language:: English

Subject:: 55 BIOLOGY AND MEDICINE, BASIC STUDIES; HUMAN CHROMOSOMES; PATTERN RECOGNITION; GENETIC MAPPING; COMPUTER CALCULATIONS; CODONS; EXONS

Citation Formats


                    Gelfand, M S, Mironov, A A, and Pevzner, P A. Spliced alignment: A new approach to gene recognition.  United States: N. p., 1996. 
        Web.

Copy to clipboard


                    Gelfand, M S, Mironov, A A, & Pevzner, P A. Spliced alignment: A new approach to gene recognition.  United States.

Copy to clipboard


                    Gelfand, M S, Mironov, A A, and Pevzner, P A. 1996.  
        "Spliced alignment: A new approach to gene recognition".  United States.

Copy to clipboard


                    
@article{osti_495270,

  title        = {Spliced alignment: A new approach to gene recognition},

  author       = {Gelfand, M S and Mironov, A A and Pevzner, P A},

  abstractNote = {Gene structure prediction is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics and artificial intelligence and, surprisingly enough, applications of theoretical computer science methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way towards a new combinatorial approach to gene recognition. This paper describes a spliced alignment algorithm and a software tool which explores all possible exon assemblies in polynomial time and finds the multi-exon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; the authors also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives the average correlation between the predicted and the actual genes was 99%, which is a very high accuracy as compared with other existing methods. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused by either (i) extremely short (less than 5 amino acids) initial or terminal exons, or (ii) alternative splicing, or (iii) errors in database feature tables. 38 refs., 3 tabs.},

  doi          = {},

  url          = {https://www.osti.gov/biblio/495270},
  journal      = {},
number       = ,

  volume       = ,

  place        = {United States},

  year         = {Tue Dec 31 00:00:00 EST 1996},

  month        = {Tue Dec 31 00:00:00 EST 1996}

}

Copy to clipboard

Conference:

Other availability

Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Export Metadata

Save to My Library

Similar records in OSTI.GOV collections:

Combinatorial methods for gene recognition

Technical Report Pevzner, P

The major result of the project is the development of a new approach to gene recognition called spliced alignment algorithm. They have developed an algorithm and implemented a software tool (for both IBM PC and UNIX platforms) which explores all possible exon assemblies in polynomial time and finds the multi-exon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully performs exons assemblies even in the case of short exons or exons with unusual codon usage; they also report correct assemblies for the genes with more than 10 exons provided a homologous protein ismore »« less
https://doi.org/10.2172/764709

Full Text Available
Las Vegas algorithms for gene recognition: Suboptimal and error-tolerant spliced alignment

Conference Sze, Sing-Hoi; Pevzner, P

Recently, Gelfand, Mironov and Pevzner proposed a spliced alignment approach to gene recognition that provides 99% accurate recognition of human gene if a related mammalian protein is available. However, even 99% accurate gene predictions are insufficient for automated sequence annotation in large-scale sequencing projects and therefore have to be complemented by experimental gene verification. 100% accurate gene predictions would lead to a substantial reduction of experimental work on gene identification. Our goal is to develop an algorithm that either predicts an exon assembly with accuracy sufficient for sequence annotation or warns a biologist that the accuracy of a prediction ismore »« less
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames

Technical Report Solovyev, V; Salamov, A; Lawrence, C

Discriminant analysis is applied to the problem of recognition 5`-, internal and 3`-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotide in protein coding and nation regions. The accuracy of our splice site recognition function is about 97%. A discriminant function for 5`-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF codingmore »« less
Genomic organization of the human {alpha}-adducin gene and its alternately spliced isoforms

Journal Article Lin, B; Nasir, J; McDonald, H - Genomics

The cDNA for the human {alpha}-adducin gene has been cloned, and different alternately spliced forms have been identified. We report the complete genomic organization of the human {alpha}-adducin gene and these alternately spliced forms. The human {alpha}-adducin gene, spanning approximately 85 kb, consists of 16 exons ranging in size from 34 to 1892 bp. One of the spliced forms of the human {alpha}-adducin gene results from alternate use of the 5{prime} splice donor site for exon 10, while another results in a truncated protein following insertion of 34 bp comprising exon 15, followed by a premature stop codon. This alternatemore »« less
https://doi.org/10.1016/0888-7543(95)80113-Z
Gene structure for the. alpha. 1 chain of a human short-chain collagen (type XIII) with alternatively spliced transcripts and translation termination codon at the 5' end of the last exon

Journal Article Tikka, L; Pihlajaniemi, T; Henttu, P; ... - Proceedings of the National Academy of Sciences of the United States of America; (USA)

Two overlapping human genomic clones that encode a short-chain collagen, designated {alpha}1(XIII), were isolated by using recently described cDNA clones. Characterization of the cosmid clones that span {approx} 65,000 base pairs (bp) of the 3' end of the gene established several unusual features of this collagen gene. The last exon encodes solely the 3' untranslated region and it begins with a complete stop codon. The 10 adjacent exons vary in size from 27 to 87 bp and two of them are 54 bp. Therefore, the {alpha}1-chain gene of type XIII collagen has some features found in genes for fibrillar collagensmore »« less
https://doi.org/10.1073/pnas.85.20.7491

Similar Records