| | |
Summary: Nonlinear Mapping of Massive Data Sets
by Fuzzy Clustering and Neural Networks
DMITRII N. RASSOKHIN, VICTOR S. LOBANOV,
DIMITRIS K. AGRAFIOTIS
3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Exton, Pennsylvania 19341
Received 28 March 2000; accepted 20 July 2000
ABSTRACT: Producing good low-dimensional representations of
high-dimensional data is a common and important task in many data mining
applications. Two methods that have been particularly useful in this regard are
multidimensional scaling and nonlinear mapping. These methods attempt to
visualize a set of objects described by means of a dissimilarity or distance matrix
on a low-dimensional display plane in a way that preserves the proximities of the
objects to whatever extent is possible. Unfortunately, most known algorithms are
of quadratic order, and their use has been limited to relatively small data sets. We
recently demonstrated that nonlinear maps derived from a small random sample
of a large data set exhibit the same structure and characteristics as that of the
entire collection, and that this structure can be easily extracted by a neural
network, making possible the scaling of data set orders of magnitude larger than
those accessible with conventional methodologies. Here, we present a variant of
this algorithm based on local learning. The method employs a fuzzy clustering
|