 
Summary: Fitting Algebraic Curves to Noisy Data
Sanjeev Arora # Subhash Khot +
December 14, 2002
Abstract
We introduce the following problem which is motivated by applications in vision and
pattern detection : We are given pairs of datapoints (x 1 , y 1 ), (x 2 , y 2 ), . . . , (xm , ym ) #
[1, 1] × [1, 1], a noise parameter # > 0, a degree bound d, and a threshold # > 0. We
desire an algorithm that enlists every degree d polynomial h such that
h(x i ) y i  # # for at least # fraction of the indices i (1)
If # = 0, this is just the list decoding problem that has been popular in complexity theory
and for which Sudan gave a poly(m,d) time algorithm. However, for # > 0, the problem
as stated becomes illposed and one needs a careful reformulation (see Introduction).
We prove a few basic results about this (reformulated) problem. We show that the prob
lem has no polynomial time algorithm. This is shown by exhibiting an instance of the
problem where the number of solutions is as large as exp(d 0.5# ) and every pair of so
lutions is far from each other in ## norm. On the algorithmic side, we give a rigorous
analysis of a brute force algorithm that runs in exponential time. Also, in surprising con
trast to our lowerbound, we give a polynomial time algorithm for learning the polynomials
assuming the data is generated using a mixture model in which the mixing weights are
``nondegenerate.''
