skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Semi-supervised machine-learning classification of materials synthesis procedures

Abstract

Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as “grinding” and “heating”, “dissolving” and “centrifuging”, etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.

Authors:
 [1];  [2];  [2]; ORCiD logo [3]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [3];  [2]
  1. Univ. of California, Berkeley, CA (United States). Dept. of Materials Science and Engineering; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Materials Sciences Division
  2. Univ. of California, Berkeley, CA (United States). Dept. of Materials Science and Engineering
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Materials Sciences Division
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1559267
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
npj Computational Materials
Additional Journal Information:
Journal Volume: 5; Journal Issue: 1; Journal ID: ISSN 2057-3960
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
36 MATERIALS SCIENCE

Citation Formats

Huo, Haoyan, Rong, Ziqin, Kononova, Olga, Sun, Wenhao, Botari, Tiago, He, Tanjin, Tshitoyan, Vahe, and Ceder, Gerbrand. Semi-supervised machine-learning classification of materials synthesis procedures. United States: N. p., 2019. Web. doi:10.1038/s41524-019-0204-1.
Huo, Haoyan, Rong, Ziqin, Kononova, Olga, Sun, Wenhao, Botari, Tiago, He, Tanjin, Tshitoyan, Vahe, & Ceder, Gerbrand. Semi-supervised machine-learning classification of materials synthesis procedures. United States. doi:10.1038/s41524-019-0204-1.
Huo, Haoyan, Rong, Ziqin, Kononova, Olga, Sun, Wenhao, Botari, Tiago, He, Tanjin, Tshitoyan, Vahe, and Ceder, Gerbrand. Mon . "Semi-supervised machine-learning classification of materials synthesis procedures". United States. doi:10.1038/s41524-019-0204-1. https://www.osti.gov/servlets/purl/1559267.
@article{osti_1559267,
title = {Semi-supervised machine-learning classification of materials synthesis procedures},
author = {Huo, Haoyan and Rong, Ziqin and Kononova, Olga and Sun, Wenhao and Botari, Tiago and He, Tanjin and Tshitoyan, Vahe and Ceder, Gerbrand},
abstractNote = {Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as “grinding” and “heating”, “dissolving” and “centrifuging”, etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.},
doi = {10.1038/s41524-019-0204-1},
journal = {npj Computational Materials},
number = 1,
volume = 5,
place = {United States},
year = {2019},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Neural Networks for the Prediction of Organic Chemistry Reactions
journal, October 2016


Information Retrieval and Text Mining Technologies for Chemistry
journal, May 2017


BTM: Topic Modeling over Short Texts
journal, December 2014

  • Cheng, Xueqi; Yan, Xiaohui; Lan, Yanyan
  • IEEE Transactions on Knowledge and Data Engineering, Vol. 26, Issue 12
  • DOI: 10.1109/TKDE.2014.2313872

Computational predictions of energy materials using density functional theory
journal, January 2016


A Review of Relational Machine Learning for Knowledge Graphs
journal, January 2016


Computational Chemical Synthesis Analysis and Pathway Design
journal, June 2018


The high-throughput highway to computational materials design
journal, February 2013

  • Curtarolo, Stefano; Hart, Gus L. W.; Nardelli, Marco Buongiorno
  • Nature Materials, Vol. 12, Issue 3
  • DOI: 10.1038/nmat3568

Understanding structural adaptability: a reactant informatics approach to experiment design
journal, January 2018

  • Xu, Rosalind J.; Olshansky, Jacob H.; Adler, Philip D. F.
  • Molecular Systems Design & Engineering, Vol. 3, Issue 3
  • DOI: 10.1039/C7ME00127D

The Unreasonable Effectiveness of Data
journal, March 2009

  • Halevy, Alon; Norvig, Peter; Pereira, Fernando
  • IEEE Intelligent Systems, Vol. 24, Issue 2
  • DOI: 10.1109/MIS.2009.36

Understanding crystallization pathways leading to manganese oxide polymorph formation
journal, June 2018


Synthesis and characterization of the acidic properties and pore texture of Al-SBA-15 supports for the canola oil transesterification
journal, May 2013


Probabilistic topic models
journal, April 2012


Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning
journal, October 2017


Data mining for better material synthesis: The case of pulsed laser deposition of complex oxides
journal, March 2018

  • Young, Steven R.; Maksov, Artem; Ziatdinov, Maxim
  • Journal of Applied Physics, Vol. 123, Issue 11
  • DOI: 10.1063/1.5009942

Influence of temperature and hydrogen pressure on the hydriding/dehydriding behavior of Ti-doped sodium aluminum hydride
journal, November 2007


ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
journal, October 2016

  • Swain, Matthew C.; Cole, Jacqueline M.
  • Journal of Chemical Information and Modeling, Vol. 56, Issue 10
  • DOI: 10.1021/acs.jcim.6b00207

A few useful things to know about machine learning
journal, October 2012


Nucleation of metastable aragonite CaCO 3 in seawater
journal, March 2015

  • Sun, Wenhao; Jayaraman, Saivenkataraman; Chen, Wei
  • Proceedings of the National Academy of Sciences, Vol. 112, Issue 11
  • DOI: 10.1073/pnas.1423898112

Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013

  • Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
  • APL Materials, Vol. 1, Issue 1
  • DOI: 10.1063/1.4812323

In situ identification of kinetic factors that expedite inorganic crystal formation and discovery
journal, January 2017

  • Jiang, Zhelong; Ramanathan, Arun; Shoemaker, Daniel P.
  • Journal of Materials Chemistry C, Vol. 5, Issue 23
  • DOI: 10.1039/C6TC04931A

Pressureless Sintering of Zirconium Diboride Using Boron Carbide and Carbon Additions
journal, November 2007


Machine-learning-assisted materials discovery using failed experiments
journal, May 2016

  • Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.
  • Nature, Vol. 533, Issue 7601
  • DOI: 10.1038/nature17439

Machine-learned and codified synthesis parameters of oxide materials
journal, September 2017


Thermodynamic Routes to Novel Metastable Nitrogen-Rich Nitrides
journal, August 2017


Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]
journal, March 2009

  • Chapelle, O.; Scholkopf, B.; Zien, Eds., A.
  • IEEE Transactions on Neural Networks, Vol. 20, Issue 3
  • DOI: 10.1109/TNN.2009.2015974

The thermodynamic scale of inorganic crystalline metastability
journal, November 2016

  • Sun, Wenhao; Dacek, Stephen T.; Ong, Shyue Ping
  • Science Advances, Vol. 2, Issue 11
  • DOI: 10.1126/sciadv.1600225

Planning chemical syntheses with deep neural networks and symbolic AI
journal, March 2018

  • Segler, Marwin H. S.; Preuss, Mike; Waller, Mark P.
  • Nature, Vol. 555, Issue 7698
  • DOI: 10.1038/nature25978

Random Forests
journal, January 2001


Toward Reaction-by-Design: Achieving Kinetic Control of Solid State Chemistry with Metathesis
journal, January 2017


ETM: Entity Topic Models for Mining Documents Associated with Entities
conference, December 2012

  • Kim, Hyungsul; Sun, Yizhou; Hockenmaier, Julia
  • 2012 IEEE 12th International Conference on Data Mining (ICDM)
  • DOI: 10.1109/ICDM.2012.107

Domain adaptation with latent semantic association for named entity recognition
conference, January 2009

  • Guo, Honglei; Zhu, Huijia; Guo, Zhili
  • Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics on - NAACL '09
  • DOI: 10.3115/1620754.1620795

    Works referencing / citing this record:

    Understanding crystallization pathways leading to manganese oxide polymorph formation
    journal, June 2018


    Nucleation of metastable aragonite CaCO 3 in seawater
    journal, March 2015

    • Sun, Wenhao; Jayaraman, Saivenkataraman; Chen, Wei
    • Proceedings of the National Academy of Sciences, Vol. 112, Issue 11
    • DOI: 10.1073/pnas.1423898112

    Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]
    journal, March 2009

    • Chapelle, O.; Scholkopf, B.; Zien, Eds., A.
    • IEEE Transactions on Neural Networks, Vol. 20, Issue 3
    • DOI: 10.1109/tnn.2009.2015974

    Pressureless Sintering of Zirconium Diboride Using Boron Carbide and Carbon Additions
    journal, November 2007


    Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
    journal, July 2013

    • Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
    • APL Materials, Vol. 1, Issue 1
    • DOI: 10.1063/1.4812323

    The thermodynamic scale of inorganic crystalline metastability
    journal, November 2016

    • Sun, Wenhao; Dacek, Stephen T.; Ong, Shyue Ping
    • Science Advances, Vol. 2, Issue 11
    • DOI: 10.1126/sciadv.1600225

    The Unreasonable Effectiveness of Data
    journal, March 2009

    • Halevy, Alon; Norvig, Peter; Pereira, Fernando
    • IEEE Intelligent Systems, Vol. 24, Issue 2
    • DOI: 10.1109/mis.2009.36

    Data mining for better material synthesis: The case of pulsed laser deposition of complex oxides
    journal, March 2018

    • Young, Steven R.; Maksov, Artem; Ziatdinov, Maxim
    • Journal of Applied Physics, Vol. 123, Issue 11
    • DOI: 10.1063/1.5009942

    Probabilistic topic models
    journal, April 2012


    Planning chemical syntheses with deep neural networks and symbolic AI
    journal, March 2018

    • Segler, Marwin H. S.; Preuss, Mike; Waller, Mark P.
    • Nature, Vol. 555, Issue 7698
    • DOI: 10.1038/nature25978

    Toward Reaction-by-Design: Achieving Kinetic Control of Solid State Chemistry with Metathesis
    journal, January 2017


    Thermodynamic Routes to Novel Metastable Nitrogen-Rich Nitrides
    journal, August 2017


    Machine-learned and codified synthesis parameters of oxide materials
    journal, September 2017


    Random Forests
    journal, January 2001


    Computational Chemical Synthesis Analysis and Pathway Design
    journal, June 2018


    The high-throughput highway to computational materials design
    journal, February 2013

    • Curtarolo, Stefano; Hart, Gus L. W.; Nardelli, Marco Buongiorno
    • Nature Materials, Vol. 12, Issue 3
    • DOI: 10.1038/nmat3568

    Domain adaptation with latent semantic association for named entity recognition
    conference, January 2009

    • Guo, Honglei; Zhu, Huijia; Guo, Zhili
    • Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics on - NAACL '09
    • DOI: 10.3115/1620754.1620795

    Neural Networks for the Prediction of Organic Chemistry Reactions
    journal, October 2016


    Machine-learning-assisted materials discovery using failed experiments
    journal, May 2016

    • Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.
    • Nature, Vol. 533, Issue 7601
    • DOI: 10.1038/nature17439

    ETM: Entity Topic Models for Mining Documents Associated with Entities
    conference, December 2012

    • Kim, Hyungsul; Sun, Yizhou; Hockenmaier, Julia
    • 2012 IEEE 12th International Conference on Data Mining (ICDM)
    • DOI: 10.1109/icdm.2012.107

    A few useful things to know about machine learning
    journal, October 2012


    ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
    journal, October 2016

    • Swain, Matthew C.; Cole, Jacqueline M.
    • Journal of Chemical Information and Modeling, Vol. 56, Issue 10
    • DOI: 10.1021/acs.jcim.6b00207

    In situ identification of kinetic factors that expedite inorganic crystal formation and discovery
    journal, January 2017

    • Jiang, Zhelong; Ramanathan, Arun; Shoemaker, Daniel P.
    • Journal of Materials Chemistry C, Vol. 5, Issue 23
    • DOI: 10.1039/c6tc04931a

    Computational predictions of energy materials using density functional theory
    journal, January 2016


    Information Retrieval and Text Mining Technologies for Chemistry
    journal, May 2017


    Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning
    journal, October 2017


    A Review of Relational Machine Learning for Knowledge Graphs
    journal, January 2016


    Influence of temperature and hydrogen pressure on the hydriding/dehydriding behavior of Ti-doped sodium aluminum hydride
    journal, November 2007


    BTM: Topic Modeling over Short Texts
    journal, December 2014

    • Cheng, Xueqi; Yan, Xiaohui; Lan, Yanyan
    • IEEE Transactions on Knowledge and Data Engineering, Vol. 26, Issue 12
    • DOI: 10.1109/tkde.2014.2313872

    Understanding structural adaptability: a reactant informatics approach to experiment design
    journal, January 2018

    • Xu, Rosalind J.; Olshansky, Jacob H.; Adler, Philip D. F.
    • Molecular Systems Design & Engineering, Vol. 3, Issue 3
    • DOI: 10.1039/c7me00127d

    Synthesis and characterization of the acidic properties and pore texture of Al-SBA-15 supports for the canola oil transesterification
    journal, May 2013