Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
An Effective Support Vector Machines (SVM) Performance Using Hierarchical Clustering Mamoun Awad, Latifur Khan, Farokh Bastani, and I-Ling Yen
 

Summary: 1
An Effective Support Vector Machines (SVM) Performance Using Hierarchical Clustering
Mamoun Awad, Latifur Khan, Farokh Bastani, and I-Ling Yen
Department of Computer Science
University of Texas at Dallas
Richardson, TX 75083-0688
Email: [maa013600, lkhan, bastani, ilyen]@utdallas.edu
ABSTRACT
Support Vector Machines map training data to higher dimensional space, and then find the maximal marginal hyper-plane
to separate the data. However, the training time for SVM to compute the maximal marginal hyper-plane is at least O(N2
)
with the data set size N, which makes it non-favorable for large data sets. This paper presents a study for enhancing the
training time of SVM, specifically when dealing with large data sets, using hierarchical clustering analysis. We use the
Dynamically Growing Self-Organizing Tree (DGSOT) algorithm for clustering because it has proved to overcome the
drawbacks of traditional hierarchical clustering algorithms (e.g., hierarchical agglomerative clustering). Clustering analysis
helps find the boundary points, which are the most qualified data points to train SVM, between two classes. We present a
new approach of combination of SVM and DGSOT, which starts with an initial training set and expands it gradually using
the clustering structure produced by the DGSOT algorithm. We propose variations of this approach by adding extra steps
to disqualify unimportant data points. First, we exclude points based on measuring the distance between data points from
different trees. Next, we exclude unimportant data points based on the heterogeneity of the clusters. We compare our

  

Source: Awad, Mamoun Adel - College of Information Technology, United Arab Emirates University

 

Collections: Computer Technologies and Information Sciences