skip to main content

DOE PAGESDOE PAGES

This content will become publicly available on June 21, 2019

Title: Distance Metrics and Clustering Methods for Mixed-type Data

In spite of the abundance of clustering techniques and algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remain a challenging problem. In order to identify the most effective approaches for clustering mixed–type data, we use both theoretical and empirical analyses to present a critical review of the strengths and weaknesses of the methods identified in the literature. Here, the guidelines on approaches to use under different scenarios are provided, along with potential directions for future research.
Authors:
ORCiD logo [1] ;  [1] ;  [2]
  1. Univ. at Buffalo, Buffalo, NY (United States)
  2. Arenadotio, New York, NY (United States)
Publication Date:
Report Number(s):
SAND-2018-7091J
Journal ID: ISSN 0306-7734; 665360
Grant/Contract Number:
AC04-94AL85000
Type:
Accepted Manuscript
Journal Name:
International Statistical Review
Additional Journal Information:
Journal Name: International Statistical Review; Journal ID: ISSN 0306-7734
Research Org:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org:
USDOE National Nuclear Security Administration (NNSA)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; discretisation; dummy coding; Gower's distance; k-mean clustering; machine learning; Mahalanobis distance; mixture model; multivariate data analysis; unsupervised learning
OSTI Identifier:
1459931

Foss, Alexander H., Markatou, Marianthi, and Ray, Bonnie. Distance Metrics and Clustering Methods for Mixed-type Data. United States: N. p., Web. doi:10.1111/insr.12274.
Foss, Alexander H., Markatou, Marianthi, & Ray, Bonnie. Distance Metrics and Clustering Methods for Mixed-type Data. United States. doi:10.1111/insr.12274.
Foss, Alexander H., Markatou, Marianthi, and Ray, Bonnie. 2018. "Distance Metrics and Clustering Methods for Mixed-type Data". United States. doi:10.1111/insr.12274.
@article{osti_1459931,
title = {Distance Metrics and Clustering Methods for Mixed-type Data},
author = {Foss, Alexander H. and Markatou, Marianthi and Ray, Bonnie},
abstractNote = {In spite of the abundance of clustering techniques and algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remain a challenging problem. In order to identify the most effective approaches for clustering mixed–type data, we use both theoretical and empirical analyses to present a critical review of the strengths and weaknesses of the methods identified in the literature. Here, the guidelines on approaches to use under different scenarios are provided, along with potential directions for future research.},
doi = {10.1111/insr.12274},
journal = {International Statistical Review},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {6}
}

Works referenced in this record:

Estimating the number of clusters in a data set via the gap statistic
journal, May 2001
  • Tibshirani, Robert; Walther, Guenther; Hastie, Trevor
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 63, Issue 2, p. 411-423
  • DOI: 10.1111/1467-9868.00293