Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids

Journal Article · · Entropy
DOI:https://doi.org/10.3390/e23020149· OSTI ID:1762549

A conceptually simple way to classify images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data cover configuration space. Here we show that this coverage can be substantially increased using coarse-graining (replacing groups of images by their centroids) and stochastic sampling (using distinct sets of centroids in combination). We use the MNIST and Fashion-MNIST data sets to show that a principled coarse-graining algorithm can convert training images into fewer image centroids without loss of accuracy of classification of test-set images by nearest-neighbor classification. Distinct batches of centroids can be used in combination as a means of stochastically sampling configuration space, and can classify test-set data more accurately than can the unaltered training set. On the MNIST and Fashion-MNIST data sets this approach converts nearest-neighbor classification from a mid-ranking- to an upper-ranking member of the set of classical machine-learning techniques.

Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1762549
Alternate ID(s):
OSTI ID: 1816061
Journal Information:
Entropy, Journal Name: Entropy Journal Issue: 2 Vol. 23; ISSN ENTRFG; ISSN 1099-4300
Publisher:
MDPI AGCopyright Statement
Country of Publication:
Switzerland
Language:
English

Similar Records

Simple coarse graining and sampling strategies for image recognition
Journal Article · Fri Sep 07 00:00:00 EDT 2018 · arXiv.org Repository · OSTI ID:1601169

Classifying and analyzing small-angle scattering data using weighted k nearest neighbors machine learning techniques
Journal Article · Mon Feb 17 19:00:00 EST 2020 · Journal of Applied Crystallography (Online) · OSTI ID:1649508