| | |
Summary: Feature Selection with Rough Sets for Web
Page Classification
Aijun An1
, Yanhui Huang2
, Xiangji Huang1
, and Nick Cercone3
1
York University, Toronto, Ontario, M3J 1P3, Canada
2
University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
3
Dalhousie University, Halifax, Nova Scotia, B3H 1W5, Canada
Abstract. Web page classification is the problem of assigning predefined categories
to web pages. A challenge in web page classification is how to deal with the high
dimensionality of the feature space. We present a feature reduction method based
on the rough set theory and investigate the effectiveness of the rough set feature se-
lection method on web page classification. Our experiments indicate that rough set
feature selection can improve the predictive performance when the original feature
set for representing web pages is large.
1 Introduction
|