skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An expert system for processing sequence homology data

Technical Report ·
OSTI ID:377164
;  [1]
  1. The Sanger Center, Cambridge (United Kingdom)

When confronted with the task of finding homology to large numbers of sequences, database searching tools such as Blast and Fasta generate prohibitively large amounts of information. An automatic way of making most of the decisions a trained sequence analyst would make was developed by means of a rule-based expert system combined with an algorithm to avoid non-informative biased residue composition matches. The results found relevant by the system are presented in a very concise and clear way, so that the homology can be assessed with minimum effort. The expert system, HSPcrunch, was implemented to process the output of the programs in the BLAST suite. HSPcrunch embodies rules on detecting distant similarities when pairs of weak matches are consistent with a larger gaped alignment, i.e. when Blast has broken a longer gaped alignment up into smaller ungaped ones. This way, more distant similarities can be detected with no or little side-effects of more spurious matches. The rules for how small the gaps must be to be considered significant have been derived empirically. Currently a set of rules are used that operate on two different scoring levels, one for very weak matches that have very small gaps and one for medium weak matches that have slightly larger gaps. This set of rules proved to be robust for most cases and gives high fidelity separation between real homologies and spurious matches, One of the most important rules for reducing the amount of output is to limit the number of overlapping matches to the same region of the query sequence. This way, a region with many high-scoring matches will not dominate the output and hide weaker but relevant matches to other regions. This is particularly valuable for multi-domain queries.

Research Organization:
Stanford Univ., CA (United States)
OSTI ID:
377164
Report Number(s):
CONF-9408117-; TRN: 96:005197-0044
Resource Relation:
Conference: 2. international conference on intelligent systems for molecular biology, Stanford, CA (United States), 15-17 Aug 1994; Other Information: PBD: [1994]; Related Information: Is Part Of Proceedings: Second international conference on intelligent systems for molecular biology; Altman, R.; Brutlag, D.; Karp, P.; Lathrop, R.; Searls, D. [eds.]; PB: 389 p.
Country of Publication:
United States
Language:
English

Similar Records

GATA: a graphic alignment tool for comparative sequence analysis
Journal Article · Mon Jan 17 00:00:00 EST 2005 · BMC Bioinformatics · OSTI ID:377164

New local potential useful for genome annotation and 3D modeling
Journal Article · Thu Jul 17 00:00:00 EDT 2003 · Journal of Molecular Biology · OSTI ID:377164

j5 v2.8.4
Software · Wed Jun 29 00:00:00 EDT 2016 · OSTI ID:377164