DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids

Abstract

A conceptually simple way to classify images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data cover configuration space. Here we show that this coverage can be substantially increased using coarse-graining (replacing groups of images by their centroids) and stochastic sampling (using distinct sets of centroids in combination). We use the MNIST and Fashion-MNIST data sets to show that a principled coarse-graining algorithm can convert training images into fewer image centroids without loss of accuracy of classification of test-set images by nearest-neighbor classification. Distinct batches of centroids can be used in combination as a means of stochastically sampling configuration space, and can classify test-set data more accurately than can the unaltered training set. On the MNIST and Fashion-MNIST data sets this approach converts nearest-neighbor classification from a mid-ranking- to an upper-ranking member of the set of classical machine-learning techniques.

Authors:
ORCiD logo
Publication Date:
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
OSTI Identifier:
1762549
Alternate Identifier(s):
OSTI ID: 1816061
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Published Article
Journal Name:
Entropy
Additional Journal Information:
Journal Name: Entropy Journal Volume: 23 Journal Issue: 2; Journal ID: ISSN 1099-4300
Publisher:
MDPI AG
Country of Publication:
Switzerland
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; image recognition; nearest-neighbor classification; stochastic sampling

Citation Formats

Whitelam, Stephen. Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids. Switzerland: N. p., 2021. Web. doi:10.3390/e23020149.
Whitelam, Stephen. Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids. Switzerland. https://doi.org/10.3390/e23020149
Whitelam, Stephen. Tue . "Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids". Switzerland. https://doi.org/10.3390/e23020149.
@article{osti_1762549,
title = {Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids},
author = {Whitelam, Stephen},
abstractNote = {A conceptually simple way to classify images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data cover configuration space. Here we show that this coverage can be substantially increased using coarse-graining (replacing groups of images by their centroids) and stochastic sampling (using distinct sets of centroids in combination). We use the MNIST and Fashion-MNIST data sets to show that a principled coarse-graining algorithm can convert training images into fewer image centroids without loss of accuracy of classification of test-set images by nearest-neighbor classification. Distinct batches of centroids can be used in combination as a means of stochastically sampling configuration space, and can classify test-set data more accurately than can the unaltered training set. On the MNIST and Fashion-MNIST data sets this approach converts nearest-neighbor classification from a mid-ranking- to an upper-ranking member of the set of classical machine-learning techniques.},
doi = {10.3390/e23020149},
journal = {Entropy},
number = 2,
volume = 23,
place = {Switzerland},
year = {Tue Jan 26 00:00:00 EST 2021},
month = {Tue Jan 26 00:00:00 EST 2021}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.3390/e23020149

Save / Share: