skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Distance Metrics and Clustering Methods for Mixed-type Data

Abstract

In spite of the abundance of clustering techniques and algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remain a challenging problem. In order to identify the most effective approaches for clustering mixed–type data, we use both theoretical and empirical analyses to present a critical review of the strengths and weaknesses of the methods identified in the literature. Here, the guidelines on approaches to use under different scenarios are provided, along with potential directions for future research.

Authors:
ORCiD logo [1];  [1];  [2]
  1. Univ. at Buffalo, Buffalo, NY (United States)
  2. Arenadotio, New York, NY (United States)
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1459931
Report Number(s):
SAND-2018-7091J
Journal ID: ISSN 0306-7734; 665360
Grant/Contract Number:  
AC04-94AL85000
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
International Statistical Review
Additional Journal Information:
Journal Volume: 87; Journal Issue: 1; Journal ID: ISSN 0306-7734
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; discretisation; dummy coding; Gower's distance; k-mean clustering; machine learning; Mahalanobis distance; mixture model; multivariate data analysis; unsupervised learning

Citation Formats

Foss, Alexander H., Markatou, Marianthi, and Ray, Bonnie. Distance Metrics and Clustering Methods for Mixed-type Data. United States: N. p., 2018. Web. doi:10.1111/insr.12274.
Foss, Alexander H., Markatou, Marianthi, & Ray, Bonnie. Distance Metrics and Clustering Methods for Mixed-type Data. United States. doi:10.1111/insr.12274.
Foss, Alexander H., Markatou, Marianthi, and Ray, Bonnie. Thu . "Distance Metrics and Clustering Methods for Mixed-type Data". United States. doi:10.1111/insr.12274. https://www.osti.gov/servlets/purl/1459931.
@article{osti_1459931,
title = {Distance Metrics and Clustering Methods for Mixed-type Data},
author = {Foss, Alexander H. and Markatou, Marianthi and Ray, Bonnie},
abstractNote = {In spite of the abundance of clustering techniques and algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remain a challenging problem. In order to identify the most effective approaches for clustering mixed–type data, we use both theoretical and empirical analyses to present a critical review of the strengths and weaknesses of the methods identified in the literature. Here, the guidelines on approaches to use under different scenarios are provided, along with potential directions for future research.},
doi = {10.1111/insr.12274},
journal = {International Statistical Review},
issn = {0306-7734},
number = 1,
volume = 87,
place = {United States},
year = {2018},
month = {6}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Estimating the number of clusters in a data set via the gap statistic
journal, May 2001

  • Tibshirani, Robert; Walther, Guenther; Hastie, Trevor
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 63, Issue 2, p. 411-423
  • DOI: 10.1111/1467-9868.00293