Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
SpectralCAT: Categorical Spectral Clustering of Numerical and Nominal Gil Davida,
 

Summary: SpectralCAT: Categorical Spectral Clustering of Numerical and Nominal
Data
Gil Davida,
, Amir Averbuchb
aDepartment of Mathematics, Program in Applied Mathematics, Yale University, New Haven CT 06510, USA
bSchool of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel
Abstract
Data clustering is a common technique for data analysis, which is used in many fields, including machine
learning, data mining, customer segmentation, trend analysis, pattern recognition and image analysis. Al-
though many clustering algorithms have been proposed, most of them deal with clustering of one data
type (numerical or nominal) or with mix data type (numerical and nominal) and only few of them pro-
vide a generic method that clusters all types of data. It is required for most real-world applications data
to handle both feature types and their mix. In this paper, we propose an automated technique, called
SpectralCAT, for unsupervised clustering of high-dimensional data that contains numerical or nominal or
mix of attributes. We suggest to automatically transform the high-dimensional input data into categorical
values. This is done by discovering the optimal transformation according to the Calinski-Harabasz index for
each feature and attribute in the dataset. Then, a method for spectral clustering via dimensionality reduc-
tion of the transformed data is applied. This is achieved by automatic non-linear transformations, which
identify geometric patterns in the data, and find the connections among them while projecting them onto
low-dimensional spaces. We compare our method to several clustering algorithms using 16 public datasets

  

Source: Averbuch, Amir - School of Computer Science, Tel Aviv University

 

Collections: Computer Technologies and Information Sciences