skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A robust model for finding optimal evolutionary trees

Conference ·
OSTI ID:10143202
 [1];  [2];  [3]
  1. Rutgers--the State Univ., Piscataway, NJ (United States)
  2. Arizona Univ., Tucson, AZ (United States). Dept. of Computer Science
  3. Sandia National Labs., Albuquerque, NM (United States)

Constructing evolutionary trees for species sets is a fundamental problem in biology. One of the standard models assumes the ability to compute distances between every pair of species and seeks to find an edge-weighted tree T in which the distance d{sub ij}{sup T} in the tree between the leaves of T corresponding to the species i and j exactly equals the observed distance, d{sub ij}. When such a tree exists, this is expressed in the biological literature by saying that the distance function or matrix is additive, and trees can be constructed from additive distance matrices in O(n{sup 2}) time. Real distance data is hardly ever additive, and we therefore need methods (such as approximation algorithms with guaranteed error bounds) for handling such data. In this paper we present several natural and realistic ways of modeling the inaccuracies in the distance data. In one model we assume that we have upper and lower bounds for the distances between pairs of species and try to find an additive distance matrix between these bounds. In a second model we are given a partial matrix and asked to find if we can fill in the unspecified entries in order to make the entire matrix additive. For both of these models we also consider a more restrictive problem of finding a matrix that fits a tree which is not only additive but also ultrametric. Ultrametric matrices correspond to trees which can be rooted so that the distance from the root to any leaf is the same. Ultrametric matrices are desirable in biology since the trees then indicate evolutionary time. We give polynomial time algorithms for some of the problems while showing others to be NP-Complete. We also consider various ways of ``fitting`` a given distance matrix to a tree in order to minimize various criteria of error in the fit. For most criteria this optimization problem turns out to be NP-Hard, while we do get polynomial time algorithms for some.

Research Organization:
Sandia National Labs., Albuquerque, NM (United States)
Sponsoring Organization:
USDOE, Washington, DC (United States); National Science Foundation, Washington, DC (United States)
DOE Contract Number:
AC04-76DP00789
OSTI ID:
10143202
Report Number(s):
SAND-93-0361C; CONF-9305153-2; ON: DE93010682; CNN: Contract STC-88-09648
Resource Relation:
Conference: Association for Computing Machinery (ACM) symposium on the theory of computing,San Diego, CA (United States),May 1993; Other Information: PBD: [1993]
Country of Publication:
United States
Language:
English