DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning

Abstract

In the past several years, Materials Genome Initiative (MGI) efforts have produced myriad examples of computationally designed materials in the fields of energy storage, catalysis, thermoelectrics, and hydrogen storage as well as large data resources that are used to screen for potentially transformative compounds. The bottleneck in high-Throughput materials design has thus shifted to materials synthesis, which motivates our development of a methodology to automatically compile materials synthesis parameters across tens of thousands of scholarly publications using natural language processing techniques. To demonstrate our framework's capabilities, we examine the synthesis conditions for various metal oxides across more than 12 thousand manuscripts. We then apply machine learning methods to predict the critical parameters needed to synthesize titania nanotubes via hydrothermal methods and verify this result against known mechanisms. Lastly, we demonstrate the capacity for transfer learning by using machine learning models to predict synthesis outcomes on materials systems not included in the training set and thereby outperform heuristic strategies.

Authors:
ORCiD logo [1];  [1];  [2];  [2];  [3]; ORCiD logo [1]
  1. Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
  2. Univ. of Massachusetts, Amherst, MA (United States)
  3. Univ. of California, Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
OSTI Identifier:
1476572
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Chemistry of Materials
Additional Journal Information:
Journal Volume: 29; Journal Issue: 21; Related Information: © 2017 American Chemical Society.; Journal ID: ISSN 0897-4756
Publisher:
American Chemical Society (ACS)
Country of Publication:
United States
Language:
English
Subject:
36 MATERIALS SCIENCE

Citation Formats

Kim, Edward, Huang, Kevin, Saunders, Adam, McCallum, Andrew, Ceder, Gerbrand, and Olivetti, Elsa. Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning. United States: N. p., 2017. Web. doi:10.1021/acs.chemmater.7b03500.
Kim, Edward, Huang, Kevin, Saunders, Adam, McCallum, Andrew, Ceder, Gerbrand, & Olivetti, Elsa. Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning. United States. https://doi.org/10.1021/acs.chemmater.7b03500
Kim, Edward, Huang, Kevin, Saunders, Adam, McCallum, Andrew, Ceder, Gerbrand, and Olivetti, Elsa. Thu . "Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning". United States. https://doi.org/10.1021/acs.chemmater.7b03500. https://www.osti.gov/servlets/purl/1476572.
@article{osti_1476572,
title = {Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning},
author = {Kim, Edward and Huang, Kevin and Saunders, Adam and McCallum, Andrew and Ceder, Gerbrand and Olivetti, Elsa},
abstractNote = {In the past several years, Materials Genome Initiative (MGI) efforts have produced myriad examples of computationally designed materials in the fields of energy storage, catalysis, thermoelectrics, and hydrogen storage as well as large data resources that are used to screen for potentially transformative compounds. The bottleneck in high-Throughput materials design has thus shifted to materials synthesis, which motivates our development of a methodology to automatically compile materials synthesis parameters across tens of thousands of scholarly publications using natural language processing techniques. To demonstrate our framework's capabilities, we examine the synthesis conditions for various metal oxides across more than 12 thousand manuscripts. We then apply machine learning methods to predict the critical parameters needed to synthesize titania nanotubes via hydrothermal methods and verify this result against known mechanisms. Lastly, we demonstrate the capacity for transfer learning by using machine learning models to predict synthesis outcomes on materials systems not included in the training set and thereby outperform heuristic strategies.},
doi = {10.1021/acs.chemmater.7b03500},
journal = {Chemistry of Materials},
number = 21,
volume = 29,
place = {United States},
year = {Thu Oct 19 00:00:00 EDT 2017},
month = {Thu Oct 19 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 238 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies
journal, December 2015


Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013

  • Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
  • APL Materials, Vol. 1, Issue 1
  • DOI: 10.1063/1.4812323

The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid
journal, August 2011

  • Hachmann, Johannes; Olivares-Amaya, Roberto; Atahan-Evrenk, Sule
  • The Journal of Physical Chemistry Letters, Vol. 2, Issue 17
  • DOI: 10.1021/jz200866s

Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory
journal, June 2010

  • Hautier, Geoffroy; Fischer, Christopher C.; Jain, Anubhav
  • Chemistry of Materials, Vol. 22, Issue 12
  • DOI: 10.1021/cm100795d

From Organized High-Throughput Data to Phenomenological Theory using Machine Learning: The Example of Dielectric Breakdown
journal, February 2016


The high-throughput highway to computational materials design
journal, February 2013

  • Curtarolo, Stefano; Hart, Gus L. W.; Nardelli, Marco Buongiorno
  • Nature Materials, Vol. 12, Issue 3
  • DOI: 10.1038/nmat3568

Big–deep–smart data in imaging for guiding materials design
journal, September 2015

  • Kalinin, Sergei V.; Sumpter, Bobby G.; Archibald, Richard K.
  • Nature Materials, Vol. 14, Issue 10
  • DOI: 10.1038/nmat4395

The Materials Super Highway: Integrating High-Throughput Experimentation into Mapping the Catalysis Materials Genome
journal, November 2014


Materials science with large-scale data and informatics: Unlocking new opportunities
journal, May 2016

  • Hill, Joanne; Mulholland, Gregory; Persson, Kristin
  • MRS Bulletin, Vol. 41, Issue 5
  • DOI: 10.1557/mrs.2016.93

Computational predictions of energy materials using density functional theory
journal, January 2016


Machine Learning Strategy for Accelerated Design of Polymer Dielectrics
journal, February 2016

  • Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep20952

Performance and resource considerations of Li-ion battery electrode materials
journal, January 2015

  • Ghadbeigi, Leila; Harada, Jaye K.; Lettiere, Bethany R.
  • Energy & Environmental Science, Vol. 8, Issue 6
  • DOI: 10.1039/C5EE00685F

High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds
journal, October 2016


Machine Learning and Statistical Analysis for Materials Science: Stability and Transferability of Fingerprint Descriptors and Chemical Insights
journal, May 2017


The Materials Genome Initiative: One year on
journal, August 2012


Materials Informatics: The Materials “Gene” and Big Data
journal, July 2015


Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach
journal, August 2016

  • Gómez-Bombarelli, Rafael; Aguilera-Iparraguirre, Jorge; Hirzel, Timothy D.
  • Nature Materials, Vol. 15, Issue 10
  • DOI: 10.1038/nmat4717

In silico screening of carbon-capture materials
journal, May 2012

  • Lin, Li-Chiang; Berger, Adam H.; Martin, Richard L.
  • Nature Materials, Vol. 11, Issue 7
  • DOI: 10.1038/nmat3336

Finding MOFs for Highly Selective CO 2 /N 2 Adsorption Using Materials Screening Based on Efficient Assignment of Atomic Point Charges
journal, February 2012

  • Haldoupis, Emmanuel; Nair, Sankar; Sholl, David S.
  • Journal of the American Chemical Society, Vol. 134, Issue 9
  • DOI: 10.1021/ja2108239

CrossRef Text and Data Mining Services
journal, July 2015

  • Lammey, Rachael
  • Insights the UKSG journal, Vol. 28, Issue 2
  • DOI: 10.1629/uksg.233

ChemicalTagger: A tool for semantic text-mining in chemistry
journal, May 2011

  • Hawizy, Lezan; Jessop, David M.; Adams, Nico
  • Journal of Cheminformatics, Vol. 3, Issue 1
  • DOI: 10.1186/1758-2946-3-17

ChemSpot: a hybrid system for chemical named entity recognition
journal, April 2012


ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
journal, October 2016

  • Swain, Matthew C.; Cole, Jacqueline M.
  • Journal of Chemical Information and Modeling, Vol. 56, Issue 10
  • DOI: 10.1021/acs.jcim.6b00207

An Improved Non-monotonic Transition System for Dependency Parsing
conference, January 2015

  • Honnibal, Matthew; Johnson, Mark
  • Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
  • DOI: 10.18653/v1/D15-1162

GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles
journal, June 2001


Using natural language processing techniques to inform research on nanotechnology
journal, January 2015

  • Lewinski, Nastassja A.; McInnes, Bridget T.
  • Beilstein Journal of Nanotechnology, Vol. 6
  • DOI: 10.3762/bjnano.6.149

A Hybrid Human-computer Approach to the Extraction of Scientific Facts from the Literature
journal, January 2016


Combinatorial and High-Throughput Screening of Materials Libraries: Review of State of the Art
journal, August 2011

  • Potyrailo, Radislav; Rajan, Krishna; Stoewe, Klaus
  • ACS Combinatorial Science, Vol. 13, Issue 6
  • DOI: 10.1021/co200007w

Strategy for the maximum extraction of information generated from combinatorial experimentation of Co-doped ZnO thin films
journal, January 2011


The energy landscape concept and its implications for synthesis planning
journal, June 2014


Machine-learning-assisted materials discovery using failed experiments
journal, May 2016

  • Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.
  • Nature, Vol. 533, Issue 7601
  • DOI: 10.1038/nature17439

Data-Driven Review of Thermoelectric Materials: Performance and Resource Considerations
journal, May 2013

  • Gaultois, Michael W.; Sparks, Taylor D.; Borg, Christopher K. H.
  • Chemistry of Materials, Vol. 25, Issue 15
  • DOI: 10.1021/cm400893e

Machine-learned and codified synthesis parameters of oxide materials
journal, September 2017


PubChem Substance and Compound databases
journal, September 2015

  • Kim, Sunghwan; Thiessen, Paul A.; Bolton, Evan E.
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv951

Zinc oxide nanostructures: growth, properties and applications
journal, June 2004


Room-temperature coexistence of large electric polarization and magnetic order in Bi Fe O 3 single crystals
journal, July 2007


Kinetics and Mechanisms of Hydrothermal Synthesis of Barium Titanate
journal, November 1996


Novel Lithium-Ion Cathode Materials Based on Layered Manganese Oxides
journal, July 2001


Enhanced Photocleavage of Water Using Titania Nanotube Arrays
journal, January 2005

  • Mor, Gopal K.; Shankar, Karthik; Paulose, Maggie
  • Nano Letters, Vol. 5, Issue 1
  • DOI: 10.1021/nl048301k

TiO2 Nanotubes: Synthesis and Applications
journal, March 2011

  • Roy, Poulomi; Berger, Steffen; Schmuki, Patrik
  • Angewandte Chemie International Edition, Vol. 50, Issue 13
  • DOI: 10.1002/anie.201001374

Effects of pH on the microstructures and photocatalytic activity of mesoporous nanocrystalline titania powders prepared via hydrothermal method
journal, October 2006


Phase and morphological transitions of titania/titanate nanostructures from an acid to an alkali hydrothermal environment
journal, January 2013

  • Zhao, Bin; Lin, Lin; He, Dannong
  • J. Mater. Chem. A, Vol. 1, Issue 5
  • DOI: 10.1039/C2TA00755J

Study on composition, structure and formation process of nanotube Na2Ti2O4(OH)2
journal, January 2003

  • Yang, Jianjun; Jin, Zhensheng; Wang, Xiaodong
  • Dalton Transactions, Issue 20
  • DOI: 10.1039/b305585j

Structural Features of Nanotubes Synthesized from NaOH Treatment on TiO 2 with Different Post-Treatments
journal, January 2006

  • Tsai, Chien-Cheng; Teng, Hsisheng
  • Chemistry of Materials, Vol. 18, Issue 2
  • DOI: 10.1021/cm0518527

The effect of hydrothermal conditions on the mesoporous structure of TiO2 nanotubes
journal, January 2004

  • Bavykin, Dmitry V.; Parmon, Valentin N.; Lapkin, Alexei A.
  • Journal of Materials Chemistry, Vol. 14, Issue 22
  • DOI: 10.1039/b406378c

The ferroelectric and cubic phases in BaTiO3 ferroelectrics are also antiferroelectric
journal, September 2006

  • Zhang, Q.; Cagin, T.; Goddard, W. A.
  • Proceedings of the National Academy of Sciences, Vol. 103, Issue 40
  • DOI: 10.1073/pnas.0606612103

Temperature-Driven Structural Phase Transition in Tetragonal-Like BiFeO$_{3}$
journal, August 2011

  • Siemons, Wolter; Biegalski, Michael D.; Nam, Joong Hee
  • Applied Physics Express, Vol. 4, Issue 9
  • DOI: 10.1143/APEX.4.095801

Progress, Challenges, and Opportunities in Two-Dimensional Materials Beyond Graphene
journal, March 2013

  • Butler, Sheneve Z.; Hollen, Shawna M.; Cao, Linyou
  • ACS Nano, Vol. 7, Issue 4, p. 2898-2926
  • DOI: 10.1021/nn400280c

The chemistry of two-dimensional layered transition metal dichalcogenide nanosheets
journal, April 2013

  • Chhowalla, Manish; Shin, Hyeon Suk; Eda, Goki
  • Nature Chemistry, Vol. 5, Issue 4, p. 263-275
  • DOI: 10.1038/nchem.1589

An atlas of two-dimensional materials
journal, January 2014

  • Miró, Pere; Audiffred, Martha; Heine, Thomas
  • Chem. Soc. Rev., Vol. 43, Issue 18
  • DOI: 10.1039/C4CS00102H

2D metal carbides and nitrides (MXenes) for energy storage
journal, January 2017


Synthesis of ultrathin CdS nanosheets as efficient visible-light-driven water splitting photocatalysts for hydrogen evolution
journal, January 2013

  • Xu, You; Zhao, Weiwei; Xu, Rui
  • Chemical Communications, Vol. 49, Issue 84
  • DOI: 10.1039/c3cc46342g

Effect of pH on the properties of ZnS thin films grown by chemical bath deposition
journal, April 2006


Combinatorial screening for new materials in unconstrained composition space with machine learning
journal, March 2014


Works referencing / citing this record:

Machine learning for molecular and materials science
journal, July 2018


Data‐driven glass/ceramic science research: Insights from the glass and ceramic and data science/informatics communities
journal, May 2019

  • De Guire, Eileen; Bartolo, Laura; Brindle, Ross
  • Journal of the American Ceramic Society, Vol. 102, Issue 11
  • DOI: 10.1111/jace.16677

Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design
journal, February 2019

  • Lookman, Turab; Balachandran, Prasanna V.; Xue, Dezhen
  • npj Computational Materials, Vol. 5, Issue 1
  • DOI: 10.1038/s41524-019-0153-8

Fast prediction of electron-impact ionization cross sections of large molecules via machine learning
journal, May 2019

  • Zhong, Linlin
  • Journal of Applied Physics, Vol. 125, Issue 18
  • DOI: 10.1063/1.5094500

Data‐Driven Materials Science: Status, Challenges, and Perspectives
journal, September 2019

  • Himanen, Lauri; Geurts, Amber; Foster, Adam Stuart
  • Advanced Science, Vol. 6, Issue 21
  • DOI: 10.1002/advs.201900808

New frontiers for the materials genome initiative
journal, April 2019

  • de Pablo, Juan J.; Jackson, Nicholas E.; Webb, Michael A.
  • npj Computational Materials, Vol. 5, Issue 1
  • DOI: 10.1038/s41524-019-0173-4

Robocrystallographer: automated crystal structure text descriptions and analysis
journal, July 2019


Progress and Perspective: Soft Thermoelectric Materials for Wearable and Internet‐of‐Things Applications
journal, February 2019

  • Zaia, Edmond W.; Gordon, Madeleine P.; Yuan, Pengyu
  • Advanced Electronic Materials, Vol. 5, Issue 11
  • DOI: 10.1002/aelm.201800823

Network analysis of synthesizable materials discovery
journal, May 2019


Machine learning in materials science
journal, August 2019


Statistical Analysis and Discovery of Heterogeneous Catalysts Based on Machine Learning from Diverse Published Data
journal, August 2019


The promise of artificial intelligence in chemical engineering: Is it here, finally?
journal, December 2018


Understanding structural adaptability: a reactant informatics approach to experiment design
journal, January 2018

  • Xu, Rosalind J.; Olshansky, Jacob H.; Adler, Philip D. F.
  • Molecular Systems Design & Engineering, Vol. 3, Issue 3
  • DOI: 10.1039/c7me00127d

Unsupervised word embeddings capture latent knowledge from materials science literature
journal, July 2019


Semi-supervised machine-learning classification of materials synthesis procedures
journal, July 2019


NanoMine schema: An extensible data representation for polymer nanocomposites
journal, November 2018

  • Zhao, He; Wang, Yixing; Lin, Anqi
  • APL Materials, Vol. 6, Issue 11
  • DOI: 10.1063/1.5046839

Machine learning for heterogeneous catalyst design and discovery
journal, May 2018

  • Goldsmith, Bryan R.; Esterhuizen, Jacques; Liu, Jin-Xun
  • AIChE Journal, Vol. 64, Issue 7
  • DOI: 10.1002/aic.16198

The Rise of Catalyst Informatics: Towards Catalyst Genomics
journal, January 2019

  • Takahashi, Keisuke; Takahashi, Lauren; Miyazato, Itsuki
  • ChemCatChem, Vol. 11, Issue 4
  • DOI: 10.1002/cctc.201801956

Nanomaterials Discovery and Design through Machine Learning
journal, January 2019


Symbolic regression in materials science
journal, June 2019

  • Wang, Yiqun; Wagner, Nicholas; Rondinelli, James M.
  • MRS Communications, Vol. 9, Issue 3
  • DOI: 10.1557/mrc.2019.85

Making machine learning a useful tool in the accelerated discovery of transition metal complexes
journal, July 2019

  • Kulik, Heather J.
  • WIREs Computational Molecular Science, Vol. 10, Issue 1
  • DOI: 10.1002/wcms.1439

Modelling of framework materials at multiple scales: current practices and open questions
journal, May 2019

  • Fraux, Guillaume; Chibani, Siwar; Coudert, François-Xavier
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 377, Issue 2149
  • DOI: 10.1098/rsta.2018.0220

Nanoinformatics, and the big challenges for the science of small things
journal, January 2019

  • Barnard, A. S.; Motevalli, B.; Parker, A. J.
  • Nanoscale, Vol. 11, Issue 41
  • DOI: 10.1039/c9nr05912a

Fast-developing machine learning support complex system research in environmental chemistry
journal, January 2020

  • Duan, Qiannan; Lee, Jianchao
  • New Journal of Chemistry, Vol. 44, Issue 4
  • DOI: 10.1039/c9nj05717j

A Critical Review of Machine Learning of Energy Materials
journal, January 2020


Experiment‐Oriented Materials Informatics for Efficient Exploration of Design Strategy and New Compounds for High‐Performance Organic Anode
journal, July 2019

  • Numazawa, Hiromichi; Igarashi, Yasuhiko; Sato, Kosuke
  • Advanced Theory and Simulations, Vol. 2, Issue 10
  • DOI: 10.1002/adts.201900130

Data mining for better material synthesis: The case of pulsed laser deposition of complex oxides
journal, March 2018

  • Young, Steven R.; Maksov, Artem; Ziatdinov, Maxim
  • Journal of Applied Physics, Vol. 123, Issue 11
  • DOI: 10.1063/1.5009942

Materials‐Informatics‐Assisted High‐Yield Synthesis of 2D Nanomaterials through Exfoliation
journal, January 2019

  • Nakada, Gentoku; Igarashi, Yasuhiko; Imai, Hiroaki
  • Advanced Theory and Simulations, Vol. 2, Issue 4
  • DOI: 10.1002/adts.201800180

Data‐Driven Materials Science: Status, Challenges, and Perspectives
journal, November 2019

  • Himanen, Lauri; Geurts, Amber; Foster, Adam Stuart
  • Advanced Science, Vol. 7, Issue 2
  • DOI: 10.1002/advs.201903667

Statistical Analysis and Discovery of Heterogeneous Catalysts Based on Machine Learning from Diverse Published Data
journal, August 2019


Data Mining for better material synthesis: the case of pulsed laser deposition of complex oxides
text, January 2017


Network analysis of synthesizable materials discovery
text, January 2018


Data-driven materials science: status, challenges and perspectives
text, January 2019


A Machine Learning Approach to Zeolite Synthesis Enabled by Automatic Literature Data Extraction
journal, April 2019


Representing Multiword Chemical Terms through Phrase-Level Preprocessing and Word Embedding
journal, October 2019


Text-mined dataset of inorganic materials synthesis recipes
journal, October 2019