skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Classifying and analyzing small-angle scattering data using weighted k nearest neighbors machine learning techniques

Abstract

A consistent challenge for both new and expert practitioners of small-angle scattering (SAS) lies in determining how to analyze the data, given the limited information content of said data and the large number of models that can be employed. Machine learning (ML) methods are powerful tools for classifying data that have found diverse applications in many fields of science. Here, ML methods are applied to the problem of classifying SAS data for the most appropriate model to use for data analysis. The approach employed is built around the method of weighted k nearest neighbors (wKNN), and utilizes a subset of the models implemented in the SasView package (https://www.sasview.org/) for generating a well defined set of training and testing data. The prediction rate of the wKNN method implemented here using a subset of SasView models is reasonably good for many of the models, but has difficulty with others, notably those based on spherical structures. A novel expansion of the wKNN method was also developed, which uses Gaussian processes to produce local surrogate models for the classification, and this significantly improves the classification accuracy. Further, by integrating a stochastic gradient descent method during post-processing, it is possible to leverage the local surrogatemore » model both to classify the SAS data with high accuracy and to predict the structural parameters that best describe the data. The linking of data classification and model fitting has the potential to facilitate the translation of measured data into results for both novice and expert practitioners of SAS.« less

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1];  [1]; ORCiD logo [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1649508
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Applied Crystallography (Online)
Additional Journal Information:
Journal Name: Journal of Applied Crystallography (Online); Journal Volume: 53; Journal Issue: 2; Journal ID: ISSN 1600-5767
Publisher:
International Union of Crystallography
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; small-angle scattering data; machine learning; modeling; SasView

Citation Formats

Archibald, Richard, Doucet, Mathieu, Johnston, Travis, Young, Steven, Yang, Erika, and Heller, William T. Classifying and analyzing small-angle scattering data using weighted k nearest neighbors machine learning techniques. United States: N. p., 2020. Web. https://doi.org/10.1107/s1600576720000552.
Archibald, Richard, Doucet, Mathieu, Johnston, Travis, Young, Steven, Yang, Erika, & Heller, William T. Classifying and analyzing small-angle scattering data using weighted k nearest neighbors machine learning techniques. United States. https://doi.org/10.1107/s1600576720000552
Archibald, Richard, Doucet, Mathieu, Johnston, Travis, Young, Steven, Yang, Erika, and Heller, William T. Tue . "Classifying and analyzing small-angle scattering data using weighted k nearest neighbors machine learning techniques". United States. https://doi.org/10.1107/s1600576720000552. https://www.osti.gov/servlets/purl/1649508.
@article{osti_1649508,
title = {Classifying and analyzing small-angle scattering data using weighted k nearest neighbors machine learning techniques},
author = {Archibald, Richard and Doucet, Mathieu and Johnston, Travis and Young, Steven and Yang, Erika and Heller, William T.},
abstractNote = {A consistent challenge for both new and expert practitioners of small-angle scattering (SAS) lies in determining how to analyze the data, given the limited information content of said data and the large number of models that can be employed. Machine learning (ML) methods are powerful tools for classifying data that have found diverse applications in many fields of science. Here, ML methods are applied to the problem of classifying SAS data for the most appropriate model to use for data analysis. The approach employed is built around the method of weighted k nearest neighbors (wKNN), and utilizes a subset of the models implemented in the SasView package (https://www.sasview.org/) for generating a well defined set of training and testing data. The prediction rate of the wKNN method implemented here using a subset of SasView models is reasonably good for many of the models, but has difficulty with others, notably those based on spherical structures. A novel expansion of the wKNN method was also developed, which uses Gaussian processes to produce local surrogate models for the classification, and this significantly improves the classification accuracy. Further, by integrating a stochastic gradient descent method during post-processing, it is possible to leverage the local surrogate model both to classify the SAS data with high accuracy and to predict the structural parameters that best describe the data. The linking of data classification and model fitting has the potential to facilitate the translation of measured data into results for both novice and expert practitioners of SAS.},
doi = {10.1107/s1600576720000552},
journal = {Journal of Applied Crystallography (Online)},
number = 2,
volume = 53,
place = {United States},
year = {2020},
month = {2}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Reduction and analysis of SANS and USANS data using IGOR Pro
journal, November 2006


Combining Field Data and Computer Simulations for Calibration and Prediction
journal, January 2004

  • Higdon, Dave; Kennedy, Marc; Cavendish, James C.
  • SIAM Journal on Scientific Computing, Vol. 26, Issue 2
  • DOI: 10.1137/S1064827503426693

Sequential Exploration of Complex Surfaces Using Minimum Energy Designs
journal, January 2015


Bayesian Treed Gaussian Process Models With an Application to Computer Modeling
journal, September 2008

  • Gramacy, Robert B.; Lee, Herbert K. H.
  • Journal of the American Statistical Association, Vol. 103, Issue 483
  • DOI: 10.1198/016214508000000689

The suite of small-angle neutron scattering instruments at Oak Ridge National Laboratory
journal, February 2018

  • Heller, William T.; Cuneo, Matthew; Debeer-Schmitt, Lisa
  • Journal of Applied Crystallography, Vol. 51, Issue 2
  • DOI: 10.1107/S1600576718001231

Minimax and maximin distance designs
journal, October 1990

  • Johnson, M. E.; Moore, L. M.; Ylvisaker, D.
  • Journal of Statistical Planning and Inference, Vol. 26, Issue 2
  • DOI: 10.1016/0378-3758(90)90122-B

Bayesian calibration of computer models
journal, August 2001

  • Kennedy, Marc C.; O'Hagan, Anthony
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 63, Issue 3
  • DOI: 10.1111/1467-9868.00294

SASfit : a tool for small-angle scattering data analysis using a library of analytical expressions
journal, September 2015

  • Breßler, Ingo; Kohlbrecher, Joachim; Thünemann, Andreas F.
  • Journal of Applied Crystallography, Vol. 48, Issue 5
  • DOI: 10.1107/S1600576715016544

Machine learning for molecular and materials science
journal, July 2018


Generalized Latin Hypercube Design for Computer Experiments
journal, November 2010