Summary: When is a String Like a String?
L. Allison, C. S. Wallace and C. N. Yee.
Department of Computer Science,
UUCP: xxx@.cs.monash.edu.au xxx=[lloyd, csw, cyee]
Supported by Australian Research Council grant A48830856
version: 15 Dec 1989
Presented at the International Symposium on Artificial Intelligence and Mathematics, Ft. Lauderdale,
Florida, January 3-5 1990.
Abstract. The question of whether or not two strings are related and, if so, of how they are related and the
problem of finding a good model of string relation are treated as inductive inference problems. A method
of inductive inference known as minimum message length encoding is applied to them. It allows the
posterior oddsratio of two theories or hypotheses to be computed. New string comparison algorithms and
methods are derived and existing algorithms are placed in a unifying framework. The connection between
string comparison algorithms and models of relation is made explicit. The posterior probability of two
strings' being related can be calculated, giving a test of significance. The methods are relevant to DNA and
to other biological macromolecules.
Keywords: string, inductive inference, minimum message length, pattern matching, DNA, macro