 
Summary: 1
An Effective Support Vector Machines (SVM) Performance Using Hierarchical Clustering
Mamoun Awad, Latifur Khan, Farokh Bastani, and ILing Yen
Department of Computer Science
University of Texas at Dallas
Richardson, TX 750830688
Email: [maa013600, lkhan, bastani, ilyen]@utdallas.edu
ABSTRACT
Support Vector Machines map training data to higher dimensional space, and then find the maximal marginal hyperplane
to separate the data. However, the training time for SVM to compute the maximal marginal hyperplane is at least O(N2
)
with the data set size N, which makes it nonfavorable for large data sets. This paper presents a study for enhancing the
training time of SVM, specifically when dealing with large data sets, using hierarchical clustering analysis. We use the
Dynamically Growing SelfOrganizing Tree (DGSOT) algorithm for clustering because it has proved to overcome the
drawbacks of traditional hierarchical clustering algorithms (e.g., hierarchical agglomerative clustering). Clustering analysis
helps find the boundary points, which are the most qualified data points to train SVM, between two classes. We present a
new approach of combination of SVM and DGSOT, which starts with an initial training set and expands it gradually using
the clustering structure produced by the DGSOT algorithm. We propose variations of this approach by adding extra steps
to disqualify unimportant data points. First, we exclude points based on measuring the distance between data points from
different trees. Next, we exclude unimportant data points based on the heterogeneity of the clusters. We compare our
