The gene identification problem: An overview for developers
Abstract
The gene identification problem is the problem of interpreting nucleotide sequences by computer, in order to provide tentative annotation on the location, structure, and functional class of protein-coding genes. This problem is of self-evident importance, and is far from being fully solved, particularly for higher eukaryotes, Thus it is not surprising that the number of algorithm and software developers working in this area is rapidly increasing. The present paper is an overview of the field, with an emphasis on eukaryotes, for such developers.
- Authors:
- Publication Date:
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- National Insts. of Health, Bethesda, MD (United States)
- OSTI Identifier:
- 64182
- Report Number(s):
- LA-UR-95-1163; CONF-9407180-1
Journal ID: ISSN 0097-8485; ON: DE95010887; TRN: 95:004367
- DOE Contract Number:
- W-7405-ENG-36
- Resource Type:
- Conference
- Resource Relation:
- Journal Volume: 20; Journal Issue: 1; Conference: 4. international workshop on open problems in computational biology, Telluride, CO (United States), 10-17 Jul 1994; Other Information: PBD: 27 Mar 1995
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 55 BIOLOGY AND MEDICINE, BASIC STUDIES; NUCLEOTIDES; DNA SEQUENCING; COMPUTER CALCULATIONS; GENES
Citation Formats
Fickett, J W. The gene identification problem: An overview for developers. United States: N. p., 1995.
Web. doi:10.1016/S0097-8485(96)80012-X.
Fickett, J W. The gene identification problem: An overview for developers. United States. https://doi.org/10.1016/S0097-8485(96)80012-X
Fickett, J W. 1995.
"The gene identification problem: An overview for developers". United States. https://doi.org/10.1016/S0097-8485(96)80012-X. https://www.osti.gov/servlets/purl/64182.
@article{osti_64182,
title = {The gene identification problem: An overview for developers},
author = {Fickett, J W},
abstractNote = {The gene identification problem is the problem of interpreting nucleotide sequences by computer, in order to provide tentative annotation on the location, structure, and functional class of protein-coding genes. This problem is of self-evident importance, and is far from being fully solved, particularly for higher eukaryotes, Thus it is not surprising that the number of algorithm and software developers working in this area is rapidly increasing. The present paper is an overview of the field, with an emphasis on eukaryotes, for such developers.},
doi = {10.1016/S0097-8485(96)80012-X},
url = {https://www.osti.gov/biblio/64182},
journal = {},
issn = {0097-8485},
number = 1,
volume = 20,
place = {United States},
year = {Mon Mar 27 00:00:00 EST 1995},
month = {Mon Mar 27 00:00:00 EST 1995}
}
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.
Save to My Library
You must Sign In or Create an Account in order to save documents to your library.
Works referenced in this record:
Complementary DNA sequencing: expressed sequence tags and human genome project
journal, June 1991
- Adams, M.; Kelley, J.; Gocayne, J.
- Science, Vol. 252, Issue 5013
Nucleotide distribution and the recognition of coding regions in DNA sequences: An information theory approach
journal, November 1985
- Almagor, Hagai
- Journal of Theoretical Biology, Vol. 117, Issue 1
Basic local alignment search tool
journal, October 1990
- Altschul, Stephen F.; Gish, Warren; Miller, Webb
- Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
Issues in searching molecular sequence databases
journal, February 1994
- Altschul, Stephen F.; Boguski, Mark S.; Gish, Warren
- Nature Genetics, Vol. 6, Issue 2
Prosite: a dictionary of sites and patterns in proteins
journal, May 1992
- Bairoch, A.
- Nucleic Acids Research, Vol. 20, Issue suppl
Quantitative analysis of ribosome binding sites in E.coli
journal, January 1994
- Barrick, Doug; Villanueba, Keith; Childs, John
- Nucleic Acids Research, Vol. 22, Issue 7
Selection of DNA binding sites by regulatory proteins
journal, June 1988
- Berg, Otto G.; von Hippel, Peter H.
- Trends in Biochemical Sciences, Vol. 13, Issue 6
dbEST — database for “expressed sequence tags”
journal, August 1993
- Boguski, Mark S.; Lowe, Todd M. J.; Tolstoshev, Carolyn M.
- Nature Genetics, Vol. 4, Issue 4
Gene Discovery in dbEST
journal, September 1994
- Boguski, Mark S.; Tolstoshev, Carolyn M.; Bassett, Douglas E.
- Science, Vol. 265, Issue 5181
What's in a genome?
journal, July 1992
- Bork, Peer; Ouzounis, Christos; Sander, Chris
- Nature, Vol. 358, Issue 6384
Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome III
journal, December 1992
- Bork, Peer; Ouzounis, Christos; Sander, Chris
- Protein Science, Vol. 1, Issue 12
Deriving non-homogeneous DNA Markov chain models by cluster analysis algorithm minimizing multiple alignment entropy
journal, September 1994
- Borodovsky, Mark; Peresetsky, Anatoly
- Computers & Chemistry, Vol. 18, Issue 3
New genes in old sequence: a strategy for finding genes in the bacterial genome
journal, August 1994
- Borodovsky, Mark; Koonin, Eugene V.; Rudd, Kenneth E.
- Trends in Biochemical Sciences, Vol. 19, Issue 8
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome
journal, January 1994
- Borodovsky, Mark; Rudd, Kenneth E.; Koonin, Eugene V.
- Nucleic Acids Research, Vol. 22, Issue 22
Organization and Expression of Eucaryotic Split Genes Coding for Proteins
journal, June 1981
- Breathnach, R.; Chambon, P.
- Annual Review of Biochemistry, Vol. 50, Issue 1
The translational termination signal database
journal, January 1993
- Brown, Chris M.; Dalphin, Mark E.; Stockwell, Peter A.
- Nucleic Acids Research, Vol. 21, Issue 13
Prediction of human mRNA donor and acceptor sites from the DNA sequence
journal, July 1991
- Brunak, Søren; Engelbrecht, Jacob; Knudsen, Steen
- Journal of Molecular Biology, Vol. 220, Issue 1
Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences
journal, April 1990
- Bucher, Philipp
- Journal of Molecular Biology, Vol. 212, Issue 4
Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments
journal, January 1992
- Cardon, Lon R.; Stormo, Gary D.
- Journal of Molecular Biology, Vol. 223, Issue 1
Eukaryotic start and stop translation sites
journal, January 1991
- Cavener, Douglas R.; Ray, Stuart C.
- Nucleic Acids Research, Vol. 19, Issue 12
Isolation of genes from complex sources of mammalian genomic DNA using exon amplification
journal, January 1994
- Church, Deanna M.; Stotler, Christy J.; Rutter, Joni L.
- Nature Genetics, Vol. 6, Issue 1
Identifying coding exons by similarity search: Alu-derived and other potentially misleading protein sequences
journal, April 1992
- Claverie, Jean-Michel
- Genomics, Vol. 12, Issue 4
Detecting Frame Shifts by Amino Acid Sequence Comparison
journal, December 1993
- Claverie, Jean-Michel
- Journal of Molecular Biology, Vol. 234, Issue 4
Some useful statistical properties of position-weight matrices
journal, September 1994
- Claverie, Jean-Michel
- Computers & Chemistry, Vol. 18, Issue 3
A Streamlined Random Sequencing Strategy for Finding Coding Exons
journal, October 1994
- Claverie, Jean-Michel
- Genomics, Vol. 23, Issue 3
[15] k-tuple frequency analysis: From intron/exon discrimination to T-cell epitope mapping
book, January 1990
- Claverie, Jean-Michel; Sauvaget, Isabelle; Bougueleret, Lydie
- Methods in Enzymology
Gene Structure Prediction by Linguistic Methods
journal, October 1994
- Dong, Shan; Searls, David B.
- Genomics, Vol. 23, Issue 3
Compilation of vertebrate-encoded transcription factors
journal, January 1992
- Faisst, Steffen; Meyer, Silke
- Nucleic Acids Research, Vol. 20, Issue 1
Determination of eukaryotic protein coding regions using neural networks and information theory
journal, July 1992
- Farber, Robert; Lapedes, Alan; Sirotkin, Karl
- Journal of Molecular Biology, Vol. 226, Issue 2
Recognition of protein coding regions in DNA sequences
journal, January 1982
- Fickett, James W.
- Nucleic Acids Research, Vol. 10, Issue 17
Inferring genes from open reading frames
journal, September 1994
- Fickett, James W.
- Computers & Chemistry, Vol. 18, Issue 3
ORFs and Genes: How Strong a Connection?
journal, January 1995
- Fickett, James W.
- Journal of Computational Biology, Vol. 2, Issue 1
Assessment of protein coding measures
journal, January 1992
- Fickett, James W.; Tung, Chang-Shung
- Nucleic Acids Research, Vol. 20, Issue 24
Base compositional structure of genomes
journal, August 1992
- Fickett, James W.; Torney, David C.; Wolf, David R.
- Genomics, Vol. 13, Issue 4
Statistical analysis of mammalian pre-mRNA splicing sites
journal, January 1989
- Gelfand, M. S.
- Nucleic Acids Research, Vol. 17, Issue 15
Computer prediction of the exon-intron structure of mammalian pre-mRNAs
journal, January 1990
- Gelfand, M. S.
- Nucleic Acids Research, Vol. 18, Issue 19
Prediction of Function in DNA Sequence Analysis
journal, January 1995
- Gelfand, M. S.
- Journal of Computational Biology, Vol. 2, Issue 1
A relational database of transcription factors
journal, January 1990
- Ghosh, David
- Nucleic Acids Research, Vol. 18, Issue 7
Identification of protein coding regions by database similarity search
journal, March 1993
- Gish, Warren; States, David J.
- Nature Genetics, Vol. 3, Issue 3
Approximations to Profile Score Distributions
journal, January 1994
- Goldstein, Larry; Waterman, Michael S.
- Journal of Computational Biology, Vol. 1, Issue 2
Ancient conserved regions in gene sequences
journal, June 1994
- Green, Philip
- Current Opinion in Structural Biology, Vol. 4, Issue 3
Ancient Conserved Regions in New Gene Sequences and the Protein Databases
journal, March 1993
- Green, Philip; Lipman, David; Hillier, LaDeana
- Science, Vol. 259, Issue 5102
Profile analysis: detection of distantly related proteins.
journal, July 1987
- Gribskov, M.; McLachlan, A. D.; Eisenberg, D.
- Proceedings of the National Academy of Sciences, Vol. 84, Issue 13
Prediction of gene structure
journal, July 1992
- Guigó, Roderic; Knudsen, Steen; Drake, Neil
- Journal of Molecular Biology, Vol. 226, Issue 1
Distinctive Sequence Features in Protein Coding Genic Non-coding, and Intergenic Human DNA
journal, October 1995
- Guigó, Roderic; Fickett, James W.
- Journal of Molecular Biology, Vol. 253, Issue 1
A survey on intron and exon lengths
journal, January 1988
- Hawkin, John D.
- Nucleic Acids Research, Vol. 16, Issue 21
Automated assembly of protein blocks for database searching
journal, January 1991
- Henikoff, Steven; Henikoff, Jorja G.
- Nucleic Acids Research, Vol. 19, Issue 23
Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks
journal, August 1992
- Hirst, Jonathan D.; Sternberg, Michael J. E.
- Biochemistry, Vol. 31, Issue 32
The prediction of exons through an analysis of spliceable open reading frames
journal, January 1992
- Hutchinson, Gordon B.; Hayden, Michael R.
- Nucleic Acids Research, Vol. 20, Issue 13
Prototypic sequences for human repetitive DNA
journal, October 1992
- Jurka, Jerzy; Walichiewicz, Jolanta; Milosavljevic, Aleksandar
- Journal of Molecular Evolution, Vol. 35, Issue 4
Software Trapping: A Strategy for Finding Genes in Large Genomic Regions
journal, April 1995
- Kamb, Alexander; Wang, Chunwei; Thomas, Alun
- Computers and Biomedical Research, Vol. 28, Issue 2
TRANSFAC Retrieval Program: A Network Model Database of Eukaryotic Transcription Regulating Sequences and Proteins
journal, January 1994
- KnÜPpel, R.; Dietze, P.; Lehnberg, W.
- Journal of Computational Biology, Vol. 1, Issue 3
Complexity charts can be used to map functional domains in DNA
journal, April 1990
- Konopka, Andrzej K.; Owens, John
- Gene Analysis Techniques, Vol. 7, Issue 2
Distance analysis helps to establish characteristic motifs in intron sequences
journal, July 1987
- Konopya, Andrzej K.; Smythers, Gary W.; Owens, John
- Gene Analysis Techniques, Vol. 4, Issue 4
Yeast chromosome III: new gene functions.
journal, February 1994
- Koonin, E. V.; Bork, P.; Sander, C.
- The EMBO Journal, Vol. 13, Issue 3
An analysis of vertebrate mRNA sequences: intimations of translational control.
journal, November 1991
- Kozak, M.
- Journal of Cell Biology, Vol. 115, Issue 4
A hidden Markov model that finds genes inE.coliDNA
journal, January 1994
- Krogh, Anders; Mian, I. Saira; Haussler, David
- Nucleic Acids Research, Vol. 22, Issue 22
Hidden Markov Models in Computational Biology
journal, February 1994
- Krogh, Anders; Brown, Michael; Mian, I. Saira
- Journal of Molecular Biology, Vol. 235, Issue 5
Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment
journal, October 1993
- Lawrence, C.; Altschul, S.; Boguski, M.
- Science, Vol. 262, Issue 5131
A transcribed gene in an intron of the human factor VIII gene
journal, May 1990
- Levinson, Barbara; Kenwrick, Susan; Lakich, Delia
- Genomics, Vol. 7, Issue 1
A dictionary of transcription control sequences
journal, January 1990
- Locker, Joseph; Buzard, Gregory
- DNA Sequence, Vol. 1, Issue 1
Evaluation of the Exon Predictions of the GRAIL Software
journal, November 1994
- Lopez, Rodrigo; Larsen, Frank; Prydz, Hans
- Genomics, Vol. 24, Issue 1
Alternative mRNA Splicing
journal, November 1992
- McKeown, Michael
- Annual Review of Cell Biology, Vol. 8, Issue 1
A method for measuring the non-random bias of a codon usage table
journal, January 1984
- McLachlan, Andrew D.; Staden, Rodger; Boswell, D. Ross
- Nucleic Acids Research, Vol. 12, Issue 24
Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins
journal, July 1989
- Mitchell, P.; Tjian, R.
- Science, Vol. 245, Issue 4916
Escherichia colipromoter sequences predictin vitroRNA polymerase selectivity
journal, January 1984
- Mulligan, Martin E.; Hawley, Diane K.; Entriken, Robert
- Nucleic Acids Research, Vol. 12, Issue 1Part2
Predictlon of splice junctions in mRNA sequences
journal, January 1985
- Nakata, Kotoko; Kanehisa, Minoru; DeLisi, Charles
- Nucleic Acids Research, Vol. 13, Issue 14
Relationship between the total size of exons and introns in protein-coding genes of higher eukaryotes.
journal, October 1982
- Naora, H.; Deacon, N. J.
- Proceedings of the National Academy of Sciences, Vol. 79, Issue 20
Construction of a dictionary of sequence motifs that characterize groups of related proteins
journal, January 1992
- Ogiwara, Atsushi; Uchiyama, Ikuo; Seto, Yasuhiko
- "Protein Engineering, Design and Selection", Vol. 5, Issue 6
Signals for the selection of a splice site in pre-mRNA
journal, May 1987
- Ohshima, Yasumi; Gotoh, Yoshie
- Journal of Molecular Biology, Vol. 195, Issue 2
The complete DNA sequence of yeast chromosome III
journal, May 1992
- Oliver, S. G.; van der Aart, Q. J. M.; Agostoni-Carbone, M. L.
- Nature, Vol. 357, Issue 6373
Correlation approach to identify coding regions in DNA sequences
journal, July 1994
- Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.
- Biophysical Journal, Vol. 67, Issue 1
Predictive motifs derived from cytosine methyltransferases
journal, January 1989
- Pósfai, János; Bhagwat, Ashok S.; Pósfai, György
- Nucleic Acids Research, Vol. 17, Issue 7
The density of transcriptional elements in promoter and non-promoter sequences
journal, January 1993
- Prestrldge, Dan S.; Burks, Christian
- Human Molecular Genetics, Vol. 2, Issue 9
Large scale bacterial gene discovery by similarity search
journal, June 1994
- Robison, Keith; Gilbert, Walter; Church, George M.
- Nature Genetics, Vol. 7, Issue 2
Construction of a facsimile data set for large genome sequence analysis
journal, September 1990
- Seely, Oliver; Feng, Da-Fei; Smith, Douglas W.
- Genomics, Vol. 8, Issue 1
[16] Splice junctions, branch point sites, and exons: Sequence statistics, identification, and applications to genome project
book, January 1990
- Senapathy, Periannan; Shapiro, Marvin B.; Harris, Nomi L.
- Methods in Enzymology
RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression
journal, January 1987
- Shapiro, Marvin B.; Senapathy, Periannan
- Nucleic Acids Research, Vol. 15, Issue 17
Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens ; a review of the considerable within-species diversity
journal, January 1988
- Sharp, Paul M.; Cowe, Elizabeth; Higgins, Desmond G.
- Nucleic Acids Research, Vol. 16, Issue 17
Structure of vertebrate genes: A statistical analysis implicating selection
journal, March 1988
- Smith, M. W.
- Journal of Molecular Evolution, Vol. 27, Issue 1
Finding sequence motifs in groups of functionally related proteins.
journal, January 1990
- Smith, H. O.; Annau, T. M.; Chandrasegaran, S.
- Proceedings of the National Academy of Sciences, Vol. 87, Issue 2
Automatic generation of primary sequence patterns from sets of related protein sequences.
journal, January 1990
- Smith, R. F.; Smith, T. F.
- Proceedings of the National Academy of Sciences, Vol. 87, Issue 1
Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks
journal, January 1993
- Snyder, Eric E.; Stormo, Gary D.
- Nucleic Acids Research, Vol. 21, Issue 3
Codon preference and its use in identifying protein coding regions in long DNA sequences
journal, January 1982
- Staden, R.; McLachian, A. D.
- Nucleic Acids Research, Vol. 10, Issue 1
[10] Finding protein coding regions in genomic sequences
book, January 1990
- Staden, Rodger
- Methods in Enzymology
QGB: Combined Use of Sequence Similarity and Codon Bias for Coding Region Identification
journal, January 1994
- States, David J.; Gish, Warren
- Journal of Computational Biology, Vol. 1, Issue 1
The C. elegans genome sequencing project: a beginning
journal, March 1992
- Sulston, J.; Du, Z.; Thomas, K.
- Nature, Vol. 356, Issue 6364
Analysis of the sequence-specific interactions between Cro repressor and operator DNA by systematic base substitution experiments.
journal, January 1989
- Takeda, Y.; Sarai, A.; Rivera, V. M.
- Proceedings of the National Academy of Sciences, Vol. 86, Issue 2
A probabilistic model for detecting coding regions in DNA sequences
journal, January 1994
- Thomas, Alun; Skolnick, Mark H.
- Mathematical Medicine and Biology, Vol. 11, Issue 3
Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.
journal, December 1991
- Uberbacher, E. C.; Mural, R. J.
- Proceedings of the National Academy of Sciences, Vol. 88, Issue 24
Protein-DNA Recognition: New Perspectives and Underlying Themes
journal, February 1994
- von Hippel, Peter H.
- Science, Vol. 263, Issue 5148
The Biochemistry of 3′-End Cleavage and Polyadenylation of Messenger rna Precursors
journal, June 1992
- Wahle, Elmar; Keller, Walter
- Annual Review of Biochemistry, Vol. 61, Issue 1
2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans
journal, March 1994
- Wilson, R.; Ainscough, R.; Anderson, K.
- Nature, Vol. 368, Issue 6466