skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Identification of human gene structure using linear discriminant functions and dynamic programming

Abstract

Development of advanced technique to identify gene structure is one of the main challenges of the Human Genome Project. Discriminant analysis was applied to the construction of recognition functions for various components of gene structure. Linear discriminant functions for splice sites, 5{prime}-coding, internal exon, and Y-coding region recognition have been developed. A gene structure prediction system FGENE has been developed based on the exon recognition functions. We compute a graph of mutual compatibility of different exons and present a gene structure models as paths of this directed acyclic graph. For an optimal model selection we apply a variant of dynamic programming algorithm to search for the path in the graph with the maximal value of the corresponding discriminant functions. Prediction by FGENE for 185 complete human gene sequences has 81% exact exon recognition accuracy and 91% accuracy at the level of individual exon nucleotides with the correlation coefficient (C) equals 0.90. Testing FGENE on 35 genes not used in the development of discriminant functions shows 71% accuracy of exact exon prediction and 89% at the nucleotide level (C=0.86). FGENE compares very favorably with the other programs currently used to predict protein-coding regions. Analysis of uncharacterized human sequences based on ourmore » methods for splice site (HSPL, RNASPL), internal exons (HEXON), all type of exons (FEXH) and human (FGENEH) and bacterial (CDSB) gene structure prediction and recognition of human and bacterial sequences (HBR) (to test a library for E. coli contamination) is available through the University of Houston, Weizmann Institute of Science network server and a WWW page of the Human Genome Center at Baylor College of Medicine.« less

Authors:
; ;  [1]
  1. Baylor College of Medicine, Houston, TX (United States)
Publication Date:
Research Org.:
Stanford Univ., CA (United States)
OSTI Identifier:
401866
Report Number(s):
CONF-9507246-
TRN: 96:005602-0044
Resource Type:
Technical Report
Resource Relation:
Conference: Intelligent Systems for Molecular Biology (ISMB) conference, Cambridge (United Kingdom), 16-19 Jul 1995; Other Information: PBD: 1995; Related Information: Is Part Of ISMB-95 -- Third international conference on intelligent systems for molecular biology: Proceedings; Rawlings, C.; Clark, D.; Altman, R.; Hunter, L.; Lengauer, T.; Wodak, S. [eds.]; PB: 427 p.
Country of Publication:
United States
Language:
English
Subject:
55 BIOLOGY AND MEDICINE, BASIC STUDIES; 99 MATHEMATICS, COMPUTERS, INFORMATION SCIENCE, MANAGEMENT, LAW, MISCELLANEOUS; GENES; F CODES; ACCURACY; DYNAMIC PROGRAMMING; NUCLEOTIDES; GENETIC MAPPING; STATISTICS

Citation Formats

Solovyev, V V, Salamov, A A, and Lawrence, C B. Identification of human gene structure using linear discriminant functions and dynamic programming. United States: N. p., 1995. Web.
Solovyev, V V, Salamov, A A, & Lawrence, C B. Identification of human gene structure using linear discriminant functions and dynamic programming. United States.
Solovyev, V V, Salamov, A A, and Lawrence, C B. 1995. "Identification of human gene structure using linear discriminant functions and dynamic programming". United States.
@article{osti_401866,
title = {Identification of human gene structure using linear discriminant functions and dynamic programming},
author = {Solovyev, V V and Salamov, A A and Lawrence, C B},
abstractNote = {Development of advanced technique to identify gene structure is one of the main challenges of the Human Genome Project. Discriminant analysis was applied to the construction of recognition functions for various components of gene structure. Linear discriminant functions for splice sites, 5{prime}-coding, internal exon, and Y-coding region recognition have been developed. A gene structure prediction system FGENE has been developed based on the exon recognition functions. We compute a graph of mutual compatibility of different exons and present a gene structure models as paths of this directed acyclic graph. For an optimal model selection we apply a variant of dynamic programming algorithm to search for the path in the graph with the maximal value of the corresponding discriminant functions. Prediction by FGENE for 185 complete human gene sequences has 81% exact exon recognition accuracy and 91% accuracy at the level of individual exon nucleotides with the correlation coefficient (C) equals 0.90. Testing FGENE on 35 genes not used in the development of discriminant functions shows 71% accuracy of exact exon prediction and 89% at the nucleotide level (C=0.86). FGENE compares very favorably with the other programs currently used to predict protein-coding regions. Analysis of uncharacterized human sequences based on our methods for splice site (HSPL, RNASPL), internal exons (HEXON), all type of exons (FEXH) and human (FGENEH) and bacterial (CDSB) gene structure prediction and recognition of human and bacterial sequences (HBR) (to test a library for E. coli contamination) is available through the University of Houston, Weizmann Institute of Science network server and a WWW page of the Human Genome Center at Baylor College of Medicine.},
doi = {},
url = {https://www.osti.gov/biblio/401866}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Dec 31 00:00:00 EST 1995},
month = {Sun Dec 31 00:00:00 EST 1995}
}

Technical Report:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that may hold this item. Keep in mind that many technical reports are not cataloged in WorldCat.

Save / Share: