Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Adaptive dimension reduction for clustering high dimensional data

Technical Report ·
DOI:https://doi.org/10.2172/807420· OSTI ID:807420
It is well-known that for high dimensional data clustering, standard algorithms such as EM and the K-means are often trapped in local minimum. many initialization methods were proposed to tackle this problem, but with only limited success. In this paper they propose a new approach to resolve this problem by repeated dimension reductions such that K-means or EM are performed only in very low dimensions. Cluster membership is utilized as a bridge between the reduced dimensional sub-space and the original space, providing flexibility and ease of implementation. Clustering analysis performed on highly overlapped Gaussians, DNA gene expression profiles and internet newsgroups demonstrate the effectiveness of the proposed algorithm.
Research Organization:
Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
Sponsoring Organization:
USDOE Director, Office of Science. Office of Advanced Scientific Computing Research. Mathematical, Information, and Computational Sciences Division (US)
DOE Contract Number:
AC03-76SF00098
OSTI ID:
807420
Report Number(s):
LBNL--51472; B& R KP1201020
Country of Publication:
United States
Language:
English

Similar Records

Accelerated Dimension-Independent Adaptive Metropolis
Journal Article · Wed Oct 26 20:00:00 EDT 2016 · SIAM Journal on Scientific Computing · OSTI ID:1346642

Dimensionality Reduction Particle Swarm Algorithm for High Dimensional Clustering
Conference · Mon Dec 31 23:00:00 EST 2007 · OSTI ID:938764

A Flocking Based algorithm for Document Clustering Analysis
Journal Article · Sat Dec 31 23:00:00 EST 2005 · Journal of System Architecture · OSTI ID:1003223