| | |
Summary: Discretization of Continuous Attributes for
Learning Classi cation Rules
Aijun An and Nick Cercone
Department of Computer Science, University of Waterloo
Waterloo, Ontario N2L 3G1 Canada
Abstract. We present a comparison of three entropy-based discretiza-
tion methods in a context of learning classi cation rules. We compare
the binary recursive discretization with a stopping criterion based on
the Minimum Description Length Principle (MDLP) 3], a non-recursive
method which simply chooses a number of cut-points with the highest
entropy gains, and a non-recursive method that selects cut-points accord-
ing to both information entropy and distribution of potential cut-points
over the instance space. Our empirical results show that the third method
gives the best predictive performance among the three methods tested.
1 Introduction
Recent work on entropy-based discretization of continuous attributes has pro-
duced positive results 2, 6] . One promising method is Fayyadand Irani's binary
recursive discretization with a stopping criterion based on the MinimumDescrip-
tionLength Principle (MDLP) 3]. The MDLP methodis reported as a successful
method for discretization in the decision tree learning and Naive-Bayes learn-
|