Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Hierarchical Density-Based Clustering of Categorical Data and a Simplification

Summary: Hierarchical Density-Based Clustering of
Categorical Data and a Simplification
Bill Andreopoulos, Aijun An, and Xiaogang Wang
York University, Dept. of Computer Science and Engineering,
Toronto Ontario, M3J 1P3, Canada
{billa, aan}@cse.yorku.ca, stevenw@mathstat.yorku.ca
Abstract. A challenge involved in applying density-based clustering to
categorical datasets is that the `cube' of attribute values has no ordering
defined. We propose the HIERDENC algorithm for hierarchical density-
based clustering of categorical data. HIERDENC offers a basis for design-
ing simpler clustering algorithms that balance the tradeoff of accuracy
and speed. The characteristics of HIERDENC include: (i) it builds a
hierarchy representing the underlying cluster structure of the categorical
dataset, (ii) it minimizes the user-specified input parameters, (iii) it is in-
sensitive to the order of object input, (iv) it can handle outliers. We eval-
uate HIERDENC on small-dimensional standard categorical datasets,
on which it produces more accurate results than other algorithms. We
present a faster simplification of HIERDENC called the MULIC algo-
rithm. MULIC performs better than subspace clustering algorithms in
terms of finding the multi-layered structure of special datasets.


Source: An, Aijun - Department of Computer Science, York University (Toronto)


Collections: Computer Technologies and Information Sciences