Unsupervised learning approaches to characterizing heterogeneous samples using X-ray single-particle imaging
- Max Planck Institute for the Structure and Dynamics of Matter, Hamburg (Germany); Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany). Center for Free-Electron Laser Science
- Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany). Center for Free-Electron Laser Science
- European XFEL, Schenefeld (Germany)
- National Univ. of Singapore (Singapore)
- Uppsala Univ. (Sweden)
- Max Planck Institute for the Structure and Dynamics of Matter, Hamburg (Germany); Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany). Center for Free-Electron Laser Science; European XFEL, Schenefeld (Germany); Univ. of Southampton (United Kingdom)
- SLAC National Accelerator Lab., Menlo Park, CA (United States). Linac Coherent Light Source (LCLS)
- Arizona State Univ., Tempe, AZ (United States)
- Univ. of Hamburg (Germany)
- Max Planck Institute for the Structure and Dynamics of Matter, Hamburg (Germany); Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany). Center for Free-Electron Laser Science; Univ. of Hamburg (Germany)
- Univ. of Melbourne, VIC (Australia)
- Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany). Center for Free-Electron Laser Science; Univ. of Hamburg (Germany)
- Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany). Center for Free-Electron Laser Science; European XFEL, Schenefeld (Germany)
- Uppsala Univ. (Sweden); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
- Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany). Center for Free-Electron Laser Science; Univ. of Hamburg (Germany); Radboud Univ., Nijmegen (Netherlands)
- European XFEL, Schenefeld (Germany); La Trobe Univ., Melbourne, VIC (Australia)
One of the outstanding analytical problems in X-ray single-particle imaging (SPI) is the classification of structural heterogeneity, which is especially difficult given the low signal-to-noise ratios of individual patterns and the fact that even identical objects can yield patterns that vary greatly when orientation is taken into consideration. Proposed here are two methods which explicitly account for this orientation-induced variation and can robustly determine the structural landscape of a sample ensemble. The first, termed common-line principal component analysis (PCA), provides a rough classification which is essentially parameter free and can be run automatically on any SPI dataset. The second method, utilizing variation auto-encoders (VAEs), can generate 3D structures of the objects at any point in the structural landscape. Both these methods are implemented in combination with the noise-tolerant expand–maximize–compress (EMC) algorithm and its utility is demonstrated by applying it to an experimental dataset from gold nanoparticles with only a few thousand photons per pattern. Both discrete structural classes and continuous deformations are recovered. These developments diverge from previous approaches of extracting reproducible subsets of patterns from a dataset and open up the possibility of moving beyond the study of homogeneous sample sets to addressing open questions on topics such as nanocrystal growth and dynamics, as well as phase transitions which have not been externally triggered.
- Research Organization:
- SLAC National Accelerator Lab., Menlo Park, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Basic Energy Sciences (BES); German Research Foundation (DFG); European Research Council (ERC); National Science Foundation (NSF); Swedish Research Council (SRC); Human Frontiers Science Program, France
- Grant/Contract Number:
- AC02-76SF00515; 194651731; 390715994; ERC-614507-Küpper; STC-1231306; RGP0010/2017
- OSTI ID:
- 1871648
- Journal Information:
- IUCrJ, Vol. 9, Issue 2; ISSN 2052-2525
- Publisher:
- International Union of CrystallographyCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Diffraction data from aerosolized Coliphage PR772 virus particles imaged with the Linac Coherent Light Source
Finding simplicity: unsupervised discovery of features, patterns, and order parameters via shift-invariant variational autoencoders *