Multi-component background learning automates signal detection for spectroscopic data
Abstract Automated experimentation has yielded data acquisition rates that supersede human processing capabilities. Artificial Intelligence offers new possibilities for automating data interpretation to generate large, high-quality datasets. Background subtraction is a long-standing challenge, particularly in settings where multiple sources of the background signal coexist, and automatic extraction of signals of interest from measured signals accelerates data interpretation. Herein, we present an unsupervised probabilistic learning approach that analyzes large data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest. The approach is demonstrated on X-ray diffraction and Raman spectroscopy data and is suitable to any type of data where the signal of interest is a positive addition to the background signals. While the model can incorporate prior knowledge, it does not require knowledge of the signals since the shapes of the background signals, the noise levels, and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework. Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets, a transformative capability with many applications in the physical sciences and beyond.
- Sponsoring Organization:
- USDOE
- OSTI ID:
- 1619644
- Journal Information:
- npj Computational Materials, Journal Name: npj Computational Materials Vol. 5 Journal Issue: 1; ISSN 2057-3960
- Publisher:
- Nature Publishing GroupCopyright Statement
- Country of Publication:
- United Kingdom
- Language:
- English
Web of Science
Similar Records
Development of Gamma Background Radiation Digital Twin with Machine Learning Algorithms: Application of Unsupervised Machine Learning to Detection of Anomalies and Nuisances in Gamma Background Radiation Environmental Screening Data
Unsupervised word embeddings capture latent knowledge from materials science literature