skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The gene identification problem: An overview for developers

Conference ·

The gene identification problem is the problem of interpreting nucleotide sequences by computer, in order to provide tentative annotation on the location, structure, and functional class of protein-coding genes. This problem is of self-evident importance, and is far from being fully solved, particularly for higher eukaryotes, Thus it is not surprising that the number of algorithm and software developers working in this area is rapidly increasing. The present paper is an overview of the field, with an emphasis on eukaryotes, for such developers.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
National Insts. of Health, Bethesda, MD (United States)
DOE Contract Number:
W-7405-ENG-36
OSTI ID:
64182
Report Number(s):
LA-UR-95-1163; CONF-9407180-1; ON: DE95010887; TRN: 95:004367
Resource Relation:
Journal Volume: 20; Journal Issue: 1; Conference: 4. international workshop on open problems in computational biology, Telluride, CO (United States), 10-17 Jul 1994; Other Information: PBD: 27 Mar 1995
Country of Publication:
United States
Language:
English

References (95)

Complementary DNA sequencing: expressed sequence tags and human genome project journal June 1991
Nucleotide distribution and the recognition of coding regions in DNA sequences: An information theory approach journal November 1985
Basic local alignment search tool journal October 1990
Issues in searching molecular sequence databases journal February 1994
Prosite: a dictionary of sites and patterns in proteins journal May 1992
Quantitative analysis of ribosome binding sites in E.coli journal January 1994
Selection of DNA binding sites by regulatory proteins journal June 1988
dbEST — database for “expressed sequence tags” journal August 1993
Gene Discovery in dbEST journal September 1994
What's in a genome? journal July 1992
Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome III journal December 1992
Deriving non-homogeneous DNA Markov chain models by cluster analysis algorithm minimizing multiple alignment entropy journal September 1994
New genes in old sequence: a strategy for finding genes in the bacterial genome journal August 1994
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome journal January 1994
Organization and Expression of Eucaryotic Split Genes Coding for Proteins journal June 1981
The translational termination signal database journal January 1993
Prediction of human mRNA donor and acceptor sites from the DNA sequence journal July 1991
Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences journal April 1990
Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments journal January 1992
Eukaryotic start and stop translation sites journal January 1991
Isolation of genes from complex sources of mammalian genomic DNA using exon amplification journal January 1994
Identifying coding exons by similarity search: Alu-derived and other potentially misleading protein sequences journal April 1992
Database of ancient sequences journal July 1993
Detecting Frame Shifts by Amino Acid Sequence Comparison journal December 1993
Some useful statistical properties of position-weight matrices journal September 1994
A Streamlined Random Sequencing Strategy for Finding Coding Exons journal October 1994
[15] k-tuple frequency analysis: From intron/exon discrimination to T-cell epitope mapping book January 1990
Gene Structure Prediction by Linguistic Methods journal October 1994
Compilation of vertebrate-encoded transcription factors journal January 1992
Determination of eukaryotic protein coding regions using neural networks and information theory journal July 1992
Recognition of protein coding regions in DNA sequences journal January 1982
Inferring genes from open reading frames journal September 1994
ORFs and Genes: How Strong a Connection? journal January 1995
Assessment of protein coding measures journal January 1992
Base compositional structure of genomes journal August 1992
Statistical analysis of mammalian pre-mRNA splicing sites journal January 1989
Computer prediction of the exon-intron structure of mammalian pre-mRNAs journal January 1990
Prediction of Function in DNA Sequence Analysis journal January 1995
A relational database of transcription factors journal January 1990
Identification of protein coding regions by database similarity search journal March 1993
Approximations to Profile Score Distributions journal January 1994
Ancient conserved regions in gene sequences journal June 1994
Ancient Conserved Regions in New Gene Sequences and the Protein Databases journal March 1993
Profile analysis: detection of distantly related proteins. journal July 1987
Prediction of gene structure journal July 1992
Distinctive Sequence Features in Protein Coding Genic Non-coding, and Intergenic Human DNA journal October 1995
A survey on intron and exon lengths journal January 1988
Automated assembly of protein blocks for database searching journal January 1991
Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks journal August 1992
The prediction of exons through an analysis of spliceable open reading frames journal January 1992
Prototypic sequences for human repetitive DNA journal October 1992
Software Trapping: A Strategy for Finding Genes in Large Genomic Regions journal April 1995
TRANSFAC Retrieval Program: A Network Model Database of Eukaryotic Transcription Regulating Sequences and Proteins journal January 1994
Complexity charts can be used to map functional domains in DNA journal April 1990
Distance analysis helps to establish characteristic motifs in intron sequences journal July 1987
Yeast chromosome III: new gene functions. journal February 1994
An analysis of vertebrate mRNA sequences: intimations of translational control. journal November 1991
A hidden Markov model that finds genes inE.coliDNA journal January 1994
Hidden Markov Models in Computational Biology journal February 1994
Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment journal October 1993
A transcribed gene in an intron of the human factor VIII gene journal May 1990
A dictionary of transcription control sequences journal January 1990
Evaluation of the Exon Predictions of the GRAIL Software journal November 1994
Alternative mRNA Splicing journal November 1992
A method for measuring the non-random bias of a codon usage table journal January 1984
Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins journal July 1989
Escherichia colipromoter sequences predictin vitroRNA polymerase selectivity journal January 1984
Predictlon of splice junctions in mRNA sequences journal January 1985
Relationship between the total size of exons and introns in protein-coding genes of higher eukaryotes. journal October 1982
Construction of a dictionary of sequence motifs that characterize groups of related proteins journal January 1992
Signals for the selection of a splice site in pre-mRNA journal May 1987
The complete DNA sequence of yeast chromosome III journal May 1992
Correlation approach to identify coding regions in DNA sequences journal July 1994
Predictive motifs derived from cytosine methyltransferases journal January 1989
The density of transcriptional elements in promoter and non-promoter sequences journal January 1993
Large scale bacterial gene discovery by similarity search journal June 1994
Construction of a facsimile data set for large genome sequence analysis journal September 1990
[16] Splice junctions, branch point sites, and exons: Sequence statistics, identification, and applications to genome project book January 1990
RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression journal January 1987
Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens ; a review of the considerable within-species diversity journal January 1988
Structure of vertebrate genes: A statistical analysis implicating selection journal March 1988
Finding sequence motifs in groups of functionally related proteins. journal January 1990
Automatic generation of primary sequence patterns from sets of related protein sequences. journal January 1990
Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks journal January 1993
Codon preference and its use in identifying protein coding regions in long DNA sequences journal January 1982
[10] Finding protein coding regions in genomic sequences book January 1990
QGB: Combined Use of Sequence Similarity and Codon Bias for Coding Region Identification journal January 1994
[13] Consensus patterns in DNA book January 1990
The C. elegans genome sequencing project: a beginning journal March 1992
Analysis of the sequence-specific interactions between Cro repressor and operator DNA by systematic base substitution experiments. journal January 1989
A probabilistic model for detecting coding regions in DNA sequences journal January 1994
Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. journal December 1991
Protein-DNA Recognition: New Perspectives and Underlying Themes journal February 1994
The Biochemistry of 3′-End Cleavage and Polyadenylation of Messenger rna Precursors journal June 1992
2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans journal March 1994