Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids

Whitelam, Stephen

doi:10.3390/e23020149

Title: Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids

Full Record
Other Related Research

Abstract

A conceptually simple way to classify images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data cover configuration space. Here we show that this coverage can be substantially increased using coarse-graining (replacing groups of images by their centroids) and stochastic sampling (using distinct sets of centroids in combination). We use the MNIST and Fashion-MNIST data sets to show that a principled coarse-graining algorithm can convert training images into fewer image centroids without loss of accuracy of classification of test-set images by nearest-neighbor classification. Distinct batches of centroids can be used in combination as a means of stochastically sampling configuration space, and can classify test-set data more accurately than can the unaltered training set. On the MNIST and Fashion-MNIST data sets this approach converts nearest-neighbor classification from a mid-ranking- to an upper-ranking member of the set of classical machine-learning techniques.

Authors:

Publication Date:: Tue Jan 26 00:00:00 EST 2021

Sponsoring Org.:: USDOE Office of Science (SC), Basic Energy Sciences (BES)

OSTI Identifier:: 1762549

Alternate Identifier(s):: OSTI ID: 1816061

Grant/Contract Number:: AC02-05CH11231

Resource Type:: Published Article

Journal Name:: Entropy

Additional Journal Information:: Journal Name: Entropy Journal Volume: 23 Journal Issue: 2; Journal ID: ISSN 1099-4300

Publisher:: MDPI AG

Country of Publication:: Switzerland

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; image recognition; nearest-neighbor classification; stochastic sampling

Citation Formats


                    Whitelam, Stephen. Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids.  Switzerland: N. p., 2021. 
Web.  doi:10.3390/e23020149.

Copy to clipboard


                    Whitelam, Stephen. Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids.  Switzerland.  https://doi.org/10.3390/e23020149

Copy to clipboard


                    Whitelam, Stephen. Tue .  
"Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids".  Switzerland.  https://doi.org/10.3390/e23020149.

Copy to clipboard


                    
@article{osti_1762549,

  title        = {Improving the Accuracy of Nearest-Neighbor Classification Using Principled Construction and Stochastic Sampling of Training-Set Centroids},

  author       = {Whitelam, Stephen},

  abstractNote = {A conceptually simple way to classify images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data cover configuration space. Here we show that this coverage can be substantially increased using coarse-graining (replacing groups of images by their centroids) and stochastic sampling (using distinct sets of centroids in combination). We use the MNIST and Fashion-MNIST data sets to show that a principled coarse-graining algorithm can convert training images into fewer image centroids without loss of accuracy of classification of test-set images by nearest-neighbor classification. Distinct batches of centroids can be used in combination as a means of stochastically sampling configuration space, and can classify test-set data more accurately than can the unaltered training set. On the MNIST and Fashion-MNIST data sets this approach converts nearest-neighbor classification from a mid-ranking- to an upper-ranking member of the set of classical machine-learning techniques.},

  doi          = {10.3390/e23020149},

  journal      = {Entropy},

  number       = 2,

  volume       = 23,

  place        = {Switzerland},

  year         = {Tue Jan 26 00:00:00 EST 2021},

  month        = {Tue Jan 26 00:00:00 EST 2021}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Publisher's Version of Record
https://doi.org/10.3390/e23020149

Other availability

Search WorldCat to find libraries that may hold this journal

Save / Share:

Export Metadata

Save to My Library

Similar Records in DOE PAGES and OSTI.GOV collections:

Simple coarse graining and sampling strategies for image recognition

Journal Article Whitelam, Stephen - arXiv.org Repository

A conceptually simple way to recognize images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data covers the required configuration space. Here we show that this coverage can be substantially increased using simple strategies of coarse graining (replacing groups of images by their centroids) and sampling (using distinct sets of centroids in combination). We use the MNIST data set to show that coarse graining can be used to convert a subset of training images into about an ordermore »« less
Full Text Available
Classifying and analyzing small-angle scattering data using weighted k nearest neighbors machine learning techniques

Journal Article Archibald, Richard ; Doucet, Mathieu ; Johnston, Travis ; ... - Journal of Applied Crystallography (Online)

A consistent challenge for both new and expert practitioners of small-angle scattering (SAS) lies in determining how to analyze the data, given the limited information content of said data and the large number of models that can be employed. Machine learning (ML) methods are powerful tools for classifying data that have found diverse applications in many fields of science. Here, ML methods are applied to the problem of classifying SAS data for the most appropriate model to use for data analysis. The approach employed is built around the method of weighted k nearest neighbors (wKNN), and utilizes a subset ofmore »« less
Cited by 18
https://doi.org/10.1107/s1600576720000552

Full Text Available
Adaptation of the fuzzy k-nearest neighbor classifier for manufacturing automation

Conference Tobin, K W ; Gleason, S S ; Karnowski, T P

The use of supervised pattern recognition technologies for automation in the manufacturing environment require the development of systems that are easy to train and use. In general, these systems attempt to emulate an inspection or measurement function typically performed by a manufacturing engineer or technician. This paper describes a self-optimizing classification system for automatic decision making in the manufacturing environment. This classification system identifies and labels unique distributions of product defects denoted as signatures. The technique relies on encapsulating human experience through a teaching method to emulate the human response to various manufacturing situations. This has been successfully accomplished throughmore »« less
Full Text Available
ASK: Adversarial Soft k-Nearest Neighbor Attack and Defense

Journal Article Wang, Ren ; Chen, Tianqi ; Yao, Philip ; ... - IEEE Access

K-Nearest Neighbor (kNN)-based deep learning methods have been applied to many applications due to their simplicity and geometric interpretability. However, the robustness of kNN-based deep classification models has not been thoroughly explored and kNN attack strategies are underdeveloped. In this paper, we first propose an Adversarial Soft kNN (ASK) loss for developing more effective kNN-based deep neural network attack strategies and designing better defense methods against them. Our ASK loss provides a differentiable surrogate of the expected kNN classification error. It is also interpretable as it preserves the mutual information between the perturbed input and the in-class-reference data. We usemore »« less
https://doi.org/10.1109/access.2022.3209243

Full Text Available
Error minimizing algorithms for nearest eighbor classifiers

Conference Porter, Reid B ; Hush, Don ; Zimmer, G Beate

Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. Wemore »« less
Full Text Available

Similar Records