Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Improving Generalization by Data Categorization Ling Li, Amrit Pratap, HsuanTien Lin, and Yaser S. AbuMostafa

Summary: Improving Generalization by Data Categorization
Ling Li, Amrit Pratap, Hsuan­Tien Lin, and Yaser S. Abu­Mostafa
Learning Systems Group, California Institute of Technology, USA
Abstract. In most of the learning algorithms, examples in the training
set are treated equally. Some examples, however, carry more reliable or
critical information about the target than the others, and some may carry
wrong information. According to their intrinsic margin, examples can be
grouped into three categories: typical, critical, and noisy. We propose
three methods, namely the selection cost, SVM confidence margin, and
AdaBoost data weight, to automatically group training examples into
these three categories. Experimental results on artificial datasets show
that, although the three methods have quite di#erent nature, they give
similar and reasonable categorization. Results with real­world datasets
further demonstrate that treating the three data categories di#erently in
learning can improve generalization.
1 Introduction
Machine learning is an alternative approach to system design. Instead of the
conventional way of mathematically modeling the system, the role of learning
is to take a dataset of examples, such as input­output pairs from an unknown
target function, and synthesize a hypothesis that best approximates the target.


Source: Abu-Mostafa, Yaser S. - Department of Mechanical Engineering & Computer Science Department, California Institute of Technology


Collections: Computer Technologies and Information Sciences