 
Summary: Probability Model Type Sufficiency
Leigh J. Fitzgibbon, Lloyd Allison and Joshua W. Comley
School of Computer Science and Software Engineering
Monash University, Victoria 3800, Australia
{leighf,lloyd,joshc}@bruce.csse.monash.edu.au
Abstract. We investigate the role of sufficient statistics in generalized
probabilistic data mining and machine learning software frameworks.
Some issues involved in the specification of a statistical model type are
discussed and we show that it is beneficial to explicitly include a suf
ficient statistic and functions for its manipulation in the model type's
specification. Instances of such types can then be used by generalized
learning algorithms while maintaining optimal learning time complex
ity. Examples are given for problems such as incremental learning and
data partitioning problems (e.g. changepoint problems, decision trees
and mixture models).
1 Introduction
The formal specification of a statistical model type is an important ingredient
of machine learning software frameworks [1]. In the interests of software reuse,
robustness, and applicability the model type should encompass a general notion
of a statistical model, and allow generalized machine learning algorithms to op
