skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Advances in scientific literature mining for interpreting materials characterization

Journal Article · · Machine Learning: Science and Technology

Abstract Using synchrotron light sources, such as the National Synchrotron Light Source II at Brookhaven National Laboratory, scientists in fields as diverse as physics, biology, and materials science, identify the atomic structure, chemical composition, or other important properties of varied specimens. x-ray spectroscopy from light sources is particularly valuable for materials research with vast information available about reference spectra in the scientific literature. However, as the technique is applicable to many science domains, searching for information about select x-ray spectroscopy spectra is impeded by the sheer number of publications. Moreover, useful information about the context of an experiment or figures presented in papers can be buried among the details, which takes time to assess. This work presents a scientific literature mining system that supports data acquisition, information extraction, and user interaction for referencing x-ray spectra identification and spectral interpretation. The goal is to provide efficient access to useful spectral data to researchers who may spend only a few days at a synchrotron light source. With this system, users browse a classification tree for papers arranged according to x-ray spectroscopic methods, chemical elements, and x-ray absorption spectroscopy edges. Relevant figures are extracted with sentences from the paper that explain them, known as ‘figure explanatory text.’ Notably, this system focuses on semantic aspects (logical analysis) to find figure explanatory text using deep contextualized word embeddings techniques and contains an interface to obtain labeled data from domain experts that is used to evaluate and improve the model.

Sponsoring Organization:
USDOE
Grant/Contract Number:
Laboratory Directed Research and Development 18-05
OSTI ID:
1835465
Journal Information:
Machine Learning: Science and Technology, Journal Name: Machine Learning: Science and Technology Vol. 2 Journal Issue: 4; ISSN 2632-2153
Publisher:
IOP PublishingCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (25)

“Inverting” X-ray Absorption Spectra of Catalysts by Machine Learning in Search for Activity Descriptors journal September 2019
A Machine Learning Approach to Zeolite Synthesis Enabled by Automatic Literature Data Extraction journal April 2019
SciBERT: A Pretrained Language Model for Scientific Text
  • Beltagy, Iz; Lo, Kyle; Cohan, Arman
  • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) https://doi.org/10.18653/v1/D19-1371
conference January 2019
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature journal October 2016
A Complete Overhaul of the Electron Energy-Loss Spectroscopy and X-Ray Absorption Spectroscopy Database: eelsdb.eu journal February 2016
X-ray absorption spectroscopy journal August 2009
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
  • Reimers, Nils; Gurevych, Iryna
  • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) https://doi.org/10.18653/v1/D19-1410
conference January 2019
Unsupervised word embeddings capture latent knowledge from materials science literature journal July 2019
Time to kick-start text mining for biomaterials journal June 2020
Automated estimation of materials parameter from X-ray absorption and electron energy-loss spectra with similarity measures journal March 2019
Fundamentals of XAFS journal January 2014
Towards data format standardization for X-ray absorption spectroscopy journal October 2012
High-throughput computational X-ray absorption spectroscopy journal July 2018
Machine learning in materials informatics: recent applications and prospects journal December 2017
A Survey on Transfer Learning journal October 2010
Plasma Treating Mixed Metal Oxides to Improve Oxidative Performance via Defect Generation journal August 2019
Summarizing figures, tables, and algorithms in scientific publications to augment search results journal February 2012
Linguistic Knowledge and Transferability of Contextual Representations conference January 2019
BERT Rediscovers the Classical NLP Pipeline conference January 2019
Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction journal June 2018
An open access, integrated XAS data repository at Diamond Light Source journal October 2020
Scalable Syntax-Aware Language Models Using Knowledge Distillation conference January 2019
Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning journal October 2017
Textual Entailment--Based Figure Summarization for Biomedical Articles
  • Saini, Naveen; Saha, Sriparna; Bhattacharyya, Pushpak
  • ACM Transactions on Multimedia Computing, Communications, and Applications, Vol. 16, Issue 1s https://doi.org/10.1145/3357334
journal April 2020
Figure-Associated Text Summarization and Evaluation journal February 2015

Similar Records

Efficient prediction of attosecond two-colour pulses from an X-ray free-electron laser with machine learning
Journal Article · Wed Mar 27 00:00:00 EDT 2024 · Scientific Reports · OSTI ID:1835465

NSLS-II Preliminary Design Report
Technical Report · Thu Nov 01 00:00:00 EDT 2007 · OSTI ID:1835465

Future Synchrotron Radiation Sources
Technical Report · Wed Jul 09 00:00:00 EDT 2003 · OSTI ID:1835465

Related Subjects