skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Construction of an integrated database to support genomic sequence analysis

Abstract

The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

Authors:
;
Publication Date:
Research Org.:
Harvard Univ., Cambridge, MA (United States)
Sponsoring Org.:
USDOE, Washington, DC (United States)
OSTI Identifier:
10192009
Report Number(s):
DOE/ER/61640-T1
ON: DE95001910; TRN: 94:010073
DOE Contract Number:
FG02-93ER61640
Resource Type:
Technical Report
Resource Relation:
Other Information: PBD: [1994]
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; GENOTYPE; INFORMATION SYSTEMS; COMPUTER CODES; METABOLISM; DNA SEQUENCING; GENETIC MAPPING; PROGRAMMING; AMINO ACID SEQUENCE; 550200; 990200; BIOCHEMISTRY; MATHEMATICS AND COMPUTERS

Citation Formats

Gilbert, W., and Overbeek, R. Construction of an integrated database to support genomic sequence analysis. United States: N. p., 1994. Web. doi:10.2172/10192009.
Gilbert, W., & Overbeek, R. Construction of an integrated database to support genomic sequence analysis. United States. doi:10.2172/10192009.
Gilbert, W., and Overbeek, R. Tue . "Construction of an integrated database to support genomic sequence analysis". United States. doi:10.2172/10192009. https://www.osti.gov/servlets/purl/10192009.
@article{osti_10192009,
title = {Construction of an integrated database to support genomic sequence analysis},
author = {Gilbert, W. and Overbeek, R.},
abstractNote = {The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.},
doi = {10.2172/10192009},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Nov 01 00:00:00 EST 1994},
month = {Tue Nov 01 00:00:00 EST 1994}
}

Technical Report:

Save / Share:
  • No abstract prepared.
  • The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less
  • Within Massachusetts Bay, a number of areas have been used as disposal sites for permitted and possibly non-permitted dumping of radioactive and hazardous wastes. The human and environmental risks associated with these disposal practices are directly related to the location, contents and condition of discrete waste containers. The resource management problem is being addressed with Geographic Information System (GIS) techniques that allow the integration of a variety of data from several sources into a single, integrated relational database/GIS. The relational database will then be used as a tool for the comprehensive evaluation of source characterization and extent of contamination duringmore » public health and ecological risk assessments.« less
  • We have used logic programming to design and implement a prototype database of genomic information for the model bacterial organism Escherichia coli. This report presents the fundamental database primitives that can be used to access and manipulate data relating to the E. coli genome. The present system, combined with a tutorial manual, provides immediate access to the integrated knowledge base for E. coli chromosome data. It also serves as the foundation for development of more user-friendly interfaces that have the same retrieval power and high-level tools to analyze complex chromosome organization.
  • We have used logic programming to design and implement a prototype database of genomic information for the model bacterial organism Escherichia coli. This report presents the fundamental database primitives that can be used to access and manipulate data relating to the E. coli genome. The present system, combined with a tutorial manual, provides immediate access to the integrated knowledge base for E. coli chromosome data. It also serves as the foundation for development of more user-friendly interfaces that have the same retrieval power and high-level tools to analyze complex chromosome organization.