skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Semi-supervised machine-learning classification of materials synthesis procedures

Journal Article · · npj Computational Materials
 [1];  [2];  [2]; ORCiD logo [3]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [3];  [2]
  1. Univ. of California, Berkeley, CA (United States). Dept. of Materials Science and Engineering; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Materials Sciences Division
  2. Univ. of California, Berkeley, CA (United States). Dept. of Materials Science and Engineering
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Materials Sciences Division

Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as “grinding” and “heating”, “dissolving” and “centrifuging”, etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1559267
Journal Information:
npj Computational Materials, Vol. 5, Issue 1; ISSN 2057-3960
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 53 works
Citation information provided by
Web of Science

References (39)

Neural Networks for the Prediction of Organic Chemistry Reactions journal October 2016
Information Retrieval and Text Mining Technologies for Chemistry journal May 2017
BTM: Topic Modeling over Short Texts journal December 2014
Computational predictions of energy materials using density functional theory journal January 2016
A Review of Relational Machine Learning for Knowledge Graphs journal January 2016
Computational Chemical Synthesis Analysis and Pathway Design journal June 2018
The high-throughput highway to computational materials design journal February 2013
Understanding structural adaptability: a reactant informatics approach to experiment design journal January 2018
The Unreasonable Effectiveness of Data journal March 2009
Understanding crystallization pathways leading to manganese oxide polymorph formation journal June 2018
Synthesis and characterization of the acidic properties and pore texture of Al-SBA-15 supports for the canola oil transesterification journal May 2013
Probabilistic topic models journal April 2012
Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning journal October 2017
Data mining for better material synthesis: The case of pulsed laser deposition of complex oxides journal March 2018
Influence of temperature and hydrogen pressure on the hydriding/dehydriding behavior of Ti-doped sodium aluminum hydride journal November 2007
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature journal October 2016
A few useful things to know about machine learning journal October 2012
Nucleation of metastable aragonite CaCO 3 in seawater journal March 2015
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation journal July 2013
Temperature-insensitive large electrostrains and electric field induced intermediate phases in (0.7−x)Bi(Mg1/2Ti1/2)O3–xPb(Mg1/3Nb2/3)O3–0.3PbTiO3 ceramics journal December 2014
In situ identification of kinetic factors that expedite inorganic crystal formation and discovery journal January 2017
Pressureless Sintering of Zirconium Diboride Using Boron Carbide and Carbon Additions journal November 2007
Machine-learning-assisted materials discovery using failed experiments journal May 2016
Machine-learned and codified synthesis parameters of oxide materials journal September 2017
Thermodynamic Routes to Novel Metastable Nitrogen-Rich Nitrides journal August 2017
Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] journal March 2009
Highly Selective Hydrodecarbonylation of Oleic Acid into n -Heptadecane over a Supported Nickel/Zinc Oxide-Alumina Catalyst journal July 2015
The thermodynamic scale of inorganic crystalline metastability journal November 2016
Planning chemical syntheses with deep neural networks and symbolic AI journal March 2018
Random Forests journal January 2001
Toward Reaction-by-Design: Achieving Kinetic Control of Solid State Chemistry with Metathesis journal January 2017
ETM: Entity Topic Models for Mining Documents Associated with Entities conference December 2012
Domain adaptation with latent semantic association for named entity recognition
  • Guo, Honglei; Zhu, Huijia; Guo, Zhili
  • Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics on - NAACL '09 https://doi.org/10.3115/1620754.1620795
conference January 2009
Probabilistic topic models conference January 2011
miRNALoc: predicting miRNA subcellular localizations based on principal component scores of physico-chemical properties and pseudo compositions of di-nucleotides journal September 2020
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature. text January 2016
A Review of Relational Machine Learning for Knowledge Graphs text January 2015
Neural networks for the prediction organic chemistry reactions text January 2016
Data Mining for better material synthesis: the case of pulsed laser deposition of complex oxides text January 2017

Cited By (1)

Text-mined dataset of inorganic materials synthesis recipes journal October 2019