 
Summary: Assessing and Comparing Classification Algorithms
Ethem Alpaydn
Department of Computer Engineering
Bo–gazic›i University
TR80815 Istanbul, Turkey
alpaydin@boun.edu.tr
Abstract
Machine learning algorithms induce classifiers that depend on the training set and hyperparameters and there is a need for statistical testing for
(i) assessing the expected error rate of a classifier, and (ii) comparing the expected error rates of two classifiers. We review interval estimation
and hypothesis testing and discuss three tests for error rate assessment and four tests for error rate comparison.
1 Introduction
In the machine learning literature, there are several classifi
cation algorithms and given a certain application, more than
one is applicable. In this paper, we are concerned with two
questions:
1. How can we assess the error rate of a classifier induced
by an algorithm ?
2. Given two classification algorithms, how can we say
one is better than the other one, for a given application ?
The classifiers compared can be trained with different
