skip to main content

Title: Sequence modelling and an extensible data model for genomic database

The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other typesmore » of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less
Authors:
 [1]
  1. California Univ., San Francisco, CA (United States); Univ. of California, Berkeley, CA (United States)
Publication Date:
OSTI Identifier:
10159439
Report Number(s):
LBL--31935
ON: DE92017107
DOE Contract Number:
AC03-76SF00098
Resource Type:
Thesis/Dissertation
Resource Relation:
Other Information: TH: Thesis (Ph.D.); PBD: Jan 1992
Research Org:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; GENETIC MAPPING; INFORMATION THEORY; DATA PROCESSING; MATHEMATICAL MODELS; MAN; PATTERN RECOGNITION; SET THEORY; DNA 550200; 550400; BIOCHEMISTRY; GENETICS