 
Summary: 1 IBD Estimation Accuracy
We consider how well each method does at assigning a high probability to the true IBD
state at a locus. To evaluate this we consider all loci with true IBD state S = r. From
among these loci, and given a threshold, we plot the fraction of loci whose estimated
probability of the locus being in the true state is below this threshold. That is, if we let
^Ur = P(Sk = rG) be the estimated probability of a locus k being in state r given the geno
type data, we plot the cumulative distribution function (CDF) Fr(u) = P( ^Ur uSk = r).
With complete information a perfect method would always estimate the probability of a
locus being in its true state as 1.0, resulting in a horizontal line Fr(u) = 0 for 0 u < 1.0
and a spike Fr(u) = 1.0 for u = 1.0. Under incomplete information an exact, unbiased
method would result in a smooth curve with higher amounts of information resulting in
curves more closely matching the complete information case. Positively biased methods
will have curves shifted to the right for positive IBD states and shifted to the left for the
zero IBD state. That is, it would assign higher probability to positive IBD states and
lower probability to the zero IBD state relative to an unbiased method.
Supplementary Figure S1 shows the CDF plots for IBD sharing of 0, 1 and 2 alleles
for sibling pairs. We see that methods that include LD in the model show the highest
accuracy at estimating the probability of being in a particular IBD state. Introducing
missing data and genotyping error reduces the accuracy of all methods, with the MERLIN
methods suffering most. Thinning the genotype data to one SNP per cM as a means of
