skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Text-mined dataset of inorganic materials synthesis recipes

Abstract

Materials discovery has become significantly facilitated and accelerated by high-throughput ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of "codified recipes" for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.

Authors:
ORCiD logo [1]; ORCiD logo [2];  [2];  [3];  [4]; ORCiD logo [3];  [5];  [2]
  1. Univ. of California, Berkeley, CA (United States)
  2. Univ. of California, Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  4. Univ. of Sao Paulo, Sao Carlos, SP (Brazil); Univ. of California, Berkeley, CA (United States)
  5. Google LLC, Mountain View, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Vehicle Technologies Office (EE-3V); National Science Foundation (NSF)
OSTI Identifier:
1580948
Grant/Contract Number:  
AC02-05CH11231; N00014-14-1-0444; 1534340
Resource Type:
Accepted Manuscript
Journal Name:
Scientific Data
Additional Journal Information:
Journal Volume: 6; Journal Issue: 1; Journal ID: ISSN 2052-4463
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION

Citation Formats

Kononova, Olga, Huo, Haoyan, He, Tanjin, Rong, Ziqin, Botari, Tiago, Sun, Wenhao, Tshitoyan, Vahe, and Ceder, Gerbrand. Text-mined dataset of inorganic materials synthesis recipes. United States: N. p., 2019. Web. doi:10.1038/s41597-019-0224-1.
Kononova, Olga, Huo, Haoyan, He, Tanjin, Rong, Ziqin, Botari, Tiago, Sun, Wenhao, Tshitoyan, Vahe, & Ceder, Gerbrand. Text-mined dataset of inorganic materials synthesis recipes. United States. doi:10.1038/s41597-019-0224-1.
Kononova, Olga, Huo, Haoyan, He, Tanjin, Rong, Ziqin, Botari, Tiago, Sun, Wenhao, Tshitoyan, Vahe, and Ceder, Gerbrand. Tue . "Text-mined dataset of inorganic materials synthesis recipes". United States. doi:10.1038/s41597-019-0224-1. https://www.osti.gov/servlets/purl/1580948.
@article{osti_1580948,
title = {Text-mined dataset of inorganic materials synthesis recipes},
author = {Kononova, Olga and Huo, Haoyan and He, Tanjin and Rong, Ziqin and Botari, Tiago and Sun, Wenhao and Tshitoyan, Vahe and Ceder, Gerbrand},
abstractNote = {Materials discovery has become significantly facilitated and accelerated by high-throughput ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of "codified recipes" for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.},
doi = {10.1038/s41597-019-0224-1},
journal = {Scientific Data},
number = 1,
volume = 6,
place = {United States},
year = {2019},
month = {10}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Information Retrieval and Text Mining Technologies for Chemistry
journal, May 2017


ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
journal, October 2016

  • Swain, Matthew C.; Cole, Jacqueline M.
  • Journal of Chemical Information and Modeling, Vol. 56, Issue 10
  • DOI: 10.1021/acs.jcim.6b00207

Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access
journal, June 2016


From DFT to machine learning: recent approaches to materials science–a review
journal, May 2019

  • Schleder, Gabriel R.; Padilha, Antonio C. M.; Acosta, Carlos Mera
  • Journal of Physics: Materials, Vol. 2, Issue 3
  • DOI: 10.1088/2515-7639/ab084b

ChemicalTagger: A tool for semantic text-mining in chemistry
journal, May 2011

  • Hawizy, Lezan; Jessop, David M.; Adams, Nico
  • Journal of Cheminformatics, Vol. 3, Issue 1
  • DOI: 10.1186/1758-2946-3-17

NOMAD: The FAIR concept for big data-driven materials science
journal, September 2018


Inverse design in search of materials with target functionalities
journal, March 2018


Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design
journal, February 2019

  • Lookman, Turab; Balachandran, Prasanna V.; Xue, Dezhen
  • npj Computational Materials, Vol. 5, Issue 1
  • DOI: 10.1038/s41524-019-0153-8

Organic Synthesis: March of the Machines
journal, January 2015

  • Ley, Steven V.; Fitzpatrick, Daniel E.; Ingham, Richard. J.
  • Angewandte Chemie International Edition, Vol. 54, Issue 11
  • DOI: 10.1002/anie.201410744

Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013

  • Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
  • APL Materials, Vol. 1, Issue 1
  • DOI: 10.1063/1.4812323

The high-throughput highway to computational materials design
journal, February 2013

  • Curtarolo, Stefano; Hart, Gus L. W.; Nardelli, Marco Buongiorno
  • Nature Materials, Vol. 12, Issue 3
  • DOI: 10.1038/nmat3568

Computer Software Review: ReaxysReaxys . Elsevier Properties SA 360 Park Avenue South, New York, NY 10010-1710 . www.info.reaxys.com
journal, December 2009

  • Goodman, Jonathan
  • Journal of Chemical Information and Modeling, Vol. 49, Issue 12
  • DOI: 10.1021/ci900437n

Materials informatics: From the atomic-level to the continuum
journal, April 2019


Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD)
journal, September 2013


AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations
journal, June 2012


New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design
journal, May 2002

  • Belsky, Alec; Hellenbrandt, Mariette; Karen, Vicky Lynn
  • Acta Crystallographica Section B Structural Science, Vol. 58, Issue 3
  • DOI: 10.1107/S0108768102006948

Neural Architectures for Named Entity Recognition
conference, January 2016

  • Lample, Guillaume; Ballesteros, Miguel; Subramanian, Sandeep
  • Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
  • DOI: 10.18653/v1/N16-1030

OSCAR4: a flexible architecture for chemical text-mining
journal, October 2011

  • Jessop, David M.; Adams, Sam E.; Willighagen, Egon L.
  • Journal of Cheminformatics, Vol. 3, Issue 1
  • DOI: 10.1186/1758-2946-3-41

In situ studies of a platform for metastable inorganic crystal growth and materials discovery
journal, July 2014

  • Shoemaker, D. P.; Hu, Y. -J.; Chung, D. Y.
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 30
  • DOI: 10.1073/pnas.1406211111

Semi-supervised machine-learning classification of materials synthesis procedures
journal, July 2019


Understanding crystallization pathways leading to manganese oxide polymorph formation
journal, June 2018


Performance and resource considerations of Li-ion battery electrode materials
journal, January 2015

  • Ghadbeigi, Leila; Harada, Jaye K.; Lettiere, Bethany R.
  • Energy & Environmental Science, Vol. 8, Issue 6
  • DOI: 10.1039/C5EE00685F

Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction
journal, June 2018


Long Short-Term Memory
journal, November 1997


Distilling a Materials Synthesis Ontology
journal, July 2019


Computational Screening of Cathode Coatings for Solid-State Batteries
journal, May 2019


An Improved Non-monotonic Transition System for Dependency Parsing
conference, January 2015

  • Honnibal, Matthew; Johnson, Mark
  • Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
  • DOI: 10.18653/v1/D15-1162

Machine-learning-assisted materials discovery using failed experiments
journal, May 2016

  • Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.
  • Nature, Vol. 533, Issue 7601
  • DOI: 10.1038/nature17439

Machine-learned and codified synthesis parameters of oxide materials
journal, September 2017


Chemical named entities recognition: a review on approaches and applications
journal, April 2014


PubChem 2019 update: improved access to chemical data
journal, October 2018

  • Kim, Sunghwan; Chen, Jie; Cheng, Tiejun
  • Nucleic Acids Research, Vol. 47, Issue D1
  • DOI: 10.1093/nar/gky1033

Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning
journal, October 2017


SymPy: symbolic computing in Python
journal, January 2017

  • Meurer, Aaron; Smith, Christopher P.; Paprocki, Mateusz
  • PeerJ Computer Science, Vol. 3
  • DOI: 10.7717/peerj-cs.103

Thermodynamic Routes to Novel Metastable Nitrogen-Rich Nitrides
journal, August 2017


Machine learning for molecular and materials science
journal, July 2018


Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory
journal, June 2010

  • Hautier, Geoffroy; Fischer, Christopher C.; Jain, Anubhav
  • Chemistry of Materials, Vol. 22, Issue 12
  • DOI: 10.1021/cm100795d

Planning chemical syntheses with deep neural networks and symbolic AI
journal, March 2018

  • Segler, Marwin H. S.; Preuss, Mike; Waller, Mark P.
  • Nature, Vol. 555, Issue 7698
  • DOI: 10.1038/nature25978