 
Summary: T H E S E C O N D I N T E R N A T I O N A L C O N F E R E N C E O N T H E T H E O R Y O F I N F O R M A T I O N R E T R I E V A L ( I C T I R 2 0 0 9 )
Modeling score distributions in information retrieval
Avi Arampatzis · Stephen Robertson
Received: 10 August 2010 / Accepted: 10 August 2010 / Published online: 26 August 2010
Ó Springer Science+Business Media, LLC 2010
Abstract We review the history of modeling score distributions, focusing on the mixture
of normalexponential by investigating the theoretical as well as the empirical evidence
supporting its use. We discuss previously suggested conditions which valid binary mixture
models should satisfy, such as the RecallFallout Convexity Hypothesis, and formulate two
new hypotheses considering the component distributions, individually as well as in pairs,
under some limiting conditions of parameter values. From all the mixtures suggested in the
past, the current theoretical argument points to the two gamma as the mostlikely universal
model, with the normalexponential being a usable approximation. Beyond the theoretical
contribution, we provide new experimental evidence showing vector space or geometric
models, and BM25, as being `friendly' to the normalexponential, and that the noncon
vexity problem that the mixture possesses is practically not severe. Furthermore, we review
recent nonbinary mixture models, speculate on graded relevance, and consider methods
such as logistic regression for score calibration.
Keywords Score distribution Á Normalization Á Distributed retrieval Á Fusion Á Filtering
1 Introduction
