Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
On Strategies for Imbalanced Text Classification Using SVM: A Comparative Study
 

Summary: On Strategies for Imbalanced Text Classification Using SVM:
A Comparative Study
Aixin Sun
School of Computer Engineering, Nanyang Technological University, Singapore
Ee-Peng Lim
School of Information Systems, Singapore Management University, Singapore
Ying Liu
Department of Industrial and Systems Engineering, Hong Kong Polytechnic University, Hong Kong
Abstract
Many real-world text classification tasks involve imbalanced training examples. The
strategies proposed to address the imbalanced classification (e.g., resampling, instance
weighting), however, have not been systematically evaluated in the text domain. In
this paper, we conduct a comparative study on the effectiveness of these strategies in the
context of imbalanced text classification using Support Vector Machines (SVM) classifier.
SVM is the interest in this study for its good classification accuracy reported in many
text classification tasks. We propose a taxonomy to organize all proposed strategies
following the training and the test phases in text classification tasks. Based on the
taxonomy, we survey the methods proposed to address the imbalanced classification.
Among them, 10 commonly-used methods were evaluated in our experiments on three
benchmark datasets, i.e., Reuters-21578, 20-Newsgroups, and WebKB. Using the area

  

Source: Aixin, Sun - School of Computer Engineering, Nanyang Technological University

 

Collections: Computer Technologies and Information Sciences