Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

An efficient bit string implementation of a database cross-field association system (with an application to protein sequence patterns)

Conference ·
OSTI ID:10162003
 [1]; ;  [2]
  1. Los Alamos National Lab., NM (United States)
  2. Biomolecular Engineering Research Center, Boston Univ., Boston, MA (United States)

We present a fast implementation of an algorithm to infer correlation between database queries. The implementation has been primarily designed to automatically obtain the best description for the function of a given protein sequence pattern. We assume that such a description is the query on the functional annotation of a protein sequence database having the closet extension in the database to the extension of the pattern. The functional annotation of a protein sequence database can be described as a set-valued attribute whose domain is a set of one-place predicates with biological meaning. The query language is then a first order language and the query space can be mapped into a set algebra in which a measure of set similarity is introduced. We have previously developed an algorithm to search such an algebra when negation is not considered. Here, we present an efficient implementation of such and algorithm and we develop a method to search exhaustively a protein sequence database for biologically relevant protein sequence patterns, incorporating such an implementation. The method relies on the initial generation of an extensive collection of amino acid sequence motifs that correspond to high information dense regions in long consensus patterns derived from homologous protein families -and their automatic evaluation using above implementation. We have used this method to automatically search the SWISSPROT protein sequence database. The results obtained show that potentially meaningful amino acid sequence patterns may have been discovered.

Research Organization:
Los Alamos National Lab., NM (United States)
Sponsoring Organization:
USDOE, Washington, DC (United States)
DOE Contract Number:
W-7405-ENG-36
OSTI ID:
10162003
Report Number(s):
LA-UR--92-1909; CONF-930117--1; ON: DE92017525
Country of Publication:
United States
Language:
English