Text-mined dataset of inorganic materials synthesis recipes
Abstract
Abstract Materials discovery has become significantly facilitated and accelerated by high-throughput ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of “codified recipes” for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.
- Authors:
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE; USDOE Office of Energy Efficiency and Renewable Energy (EERE), Vehicle Technologies Office (EE-3V); National Science Foundation (NSF)
- OSTI Identifier:
- 1619609
- Alternate Identifier(s):
- OSTI ID: 1580948
- Grant/Contract Number:
- AC02-05CH11231; N00014-14-1-0444; 1534340
- Resource Type:
- Published Article
- Journal Name:
- Scientific Data
- Additional Journal Information:
- Journal Name: Scientific Data Journal Volume: 6 Journal Issue: 1; Journal ID: ISSN 2052-4463
- Publisher:
- Nature Publishing Group
- Country of Publication:
- United Kingdom
- Language:
- English
- Subject:
- 96 KNOWLEDGE MANAGEMENT AND PRESERVATION
Citation Formats
Kononova, Olga, Huo, Haoyan, He, Tanjin, Rong, Ziqin, Botari, Tiago, Sun, Wenhao, Tshitoyan, Vahe, and Ceder, Gerbrand. Text-mined dataset of inorganic materials synthesis recipes. United Kingdom: N. p., 2019.
Web. doi:10.1038/s41597-019-0224-1.
Kononova, Olga, Huo, Haoyan, He, Tanjin, Rong, Ziqin, Botari, Tiago, Sun, Wenhao, Tshitoyan, Vahe, & Ceder, Gerbrand. Text-mined dataset of inorganic materials synthesis recipes. United Kingdom. https://doi.org/10.1038/s41597-019-0224-1
Kononova, Olga, Huo, Haoyan, He, Tanjin, Rong, Ziqin, Botari, Tiago, Sun, Wenhao, Tshitoyan, Vahe, and Ceder, Gerbrand. Tue .
"Text-mined dataset of inorganic materials synthesis recipes". United Kingdom. https://doi.org/10.1038/s41597-019-0224-1.
@article{osti_1619609,
title = {Text-mined dataset of inorganic materials synthesis recipes},
author = {Kononova, Olga and Huo, Haoyan and He, Tanjin and Rong, Ziqin and Botari, Tiago and Sun, Wenhao and Tshitoyan, Vahe and Ceder, Gerbrand},
abstractNote = {Abstract Materials discovery has become significantly facilitated and accelerated by high-throughput ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of “codified recipes” for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.},
doi = {10.1038/s41597-019-0224-1},
journal = {Scientific Data},
number = 1,
volume = 6,
place = {United Kingdom},
year = {Tue Oct 15 00:00:00 EDT 2019},
month = {Tue Oct 15 00:00:00 EDT 2019}
}
https://doi.org/10.1038/s41597-019-0224-1
Web of Science
Works referenced in this record:
Information Retrieval and Text Mining Technologies for Chemistry
journal, May 2017
- Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália
- Chemical Reviews, Vol. 117, Issue 12
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
journal, October 2016
- Swain, Matthew C.; Cole, Jacqueline M.
- Journal of Chemical Information and Modeling, Vol. 56, Issue 10
Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access
journal, June 2016
- O’Mara, Jordan; Meredig, Bryce; Michel, Kyle
- JOM, Vol. 68, Issue 8
From DFT to machine learning: recent approaches to materials science–a review
journal, May 2019
- Schleder, Gabriel R.; Padilha, Antonio C. M.; Acosta, Carlos Mera
- Journal of Physics: Materials, Vol. 2, Issue 3
Text-mined dataset of inorganic materials synthesis recipes
dataset, January 2019
- Kononova, Olga; Huo, Haoyan; He, Tanjin
- figshare
ChemicalTagger: A tool for semantic text-mining in chemistry
journal, May 2011
- Hawizy, Lezan; Jessop, David M.; Adams, Nico
- Journal of Cheminformatics, Vol. 3, Issue 1
The PAULING FILE Project and Materials Platform for Data Science: From Big Data Toward Materials Genome
book, January 2018
- Blokhin, Evgeny; Villars, Pierre
- Handbook of Materials Modeling
NOMAD: The FAIR concept for big data-driven materials science
journal, September 2018
- Draxl, Claudia; Scheffler, Matthias
- MRS Bulletin, Vol. 43, Issue 9
Inverse design in search of materials with target functionalities
journal, March 2018
- Zunger, Alex
- Nature Reviews Chemistry, Vol. 2, Issue 4
Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design
journal, February 2019
- Lookman, Turab; Balachandran, Prasanna V.; Xue, Dezhen
- npj Computational Materials, Vol. 5, Issue 1
Organic Synthesis: March of the Machines
journal, January 2015
- Ley, Steven V.; Fitzpatrick, Daniel E.; Ingham, Richard. J.
- Angewandte Chemie International Edition, Vol. 54, Issue 11
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013
- Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
- APL Materials, Vol. 1, Issue 1
The high-throughput highway to computational materials design
journal, February 2013
- Curtarolo, Stefano; Hart, Gus L. W.; Nardelli, Marco Buongiorno
- Nature Materials, Vol. 12, Issue 3
Computer Software Review: ReaxysReaxys . Elsevier Properties SA 360 Park Avenue South, New York, NY 10010-1710 . www.info.reaxys.com
journal, December 2009
- Goodman, Jonathan
- Journal of Chemical Information and Modeling, Vol. 49, Issue 12
Materials informatics: From the atomic-level to the continuum
journal, April 2019
- Rickman, J. M.; Lookman, T.; Kalinin, S. V.
- Acta Materialia, Vol. 168
Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD)
journal, September 2013
- Saal, James E.; Kirklin, Scott; Aykol, Muratahan
- JOM, Vol. 65, Issue 11
AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations
journal, June 2012
- Curtarolo, Stefano; Setyawan, Wahyu; Wang, Shidong
- Computational Materials Science, Vol. 58
New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design
journal, May 2002
- Belsky, Alec; Hellenbrandt, Mariette; Karen, Vicky Lynn
- Acta Crystallographica Section B Structural Science, Vol. 58, Issue 3
Neural Architectures for Named Entity Recognition
conference, January 2016
- Lample, Guillaume; Ballesteros, Miguel; Subramanian, Sandeep
- Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
OSCAR4: a flexible architecture for chemical text-mining
journal, October 2011
- Jessop, David M.; Adams, Sam E.; Willighagen, Egon L.
- Journal of Cheminformatics, Vol. 3, Issue 1
In situ studies of a platform for metastable inorganic crystal growth and materials discovery
journal, July 2014
- Shoemaker, D. P.; Hu, Y. -J.; Chung, D. Y.
- Proceedings of the National Academy of Sciences, Vol. 111, Issue 30
Semi-supervised machine-learning classification of materials synthesis procedures
journal, July 2019
- Huo, Haoyan; Rong, Ziqin; Kononova, Olga
- npj Computational Materials, Vol. 5, Issue 1
Understanding crystallization pathways leading to manganese oxide polymorph formation
journal, June 2018
- Chen, Bor-Rong; Sun, Wenhao; Kitchaev, Daniil A.
- Nature Communications, Vol. 9, Issue 1
Performance and resource considerations of Li-ion battery electrode materials
journal, January 2015
- Ghadbeigi, Leila; Harada, Jaye K.; Lettiere, Bethany R.
- Energy & Environmental Science, Vol. 8, Issue 6
Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction
journal, June 2018
- Court, Callum J.; Cole, Jacqueline M.
- Scientific Data, Vol. 5, Issue 1
Long Short-Term Memory
journal, November 1997
- Hochreiter, Sepp; Schmidhuber, Jürgen
- Neural Computation, Vol. 9, Issue 8
Distilling a Materials Synthesis Ontology
journal, July 2019
- Kim, Edward; Huang, Kevin; Kononova, Olga
- Matter, Vol. 1, Issue 1
Computational Screening of Cathode Coatings for Solid-State Batteries
journal, May 2019
- Xiao, Yihan; Miara, Lincoln J.; Wang, Yan
- Joule, Vol. 3, Issue 5
An Improved Non-monotonic Transition System for Dependency Parsing
conference, January 2015
- Honnibal, Matthew; Johnson, Mark
- Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Machine-learning-assisted materials discovery using failed experiments
journal, May 2016
- Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.
- Nature, Vol. 533, Issue 7601
Machine-learned and codified synthesis parameters of oxide materials
journal, September 2017
- Kim, Edward; Huang, Kevin; Tomala, Alex
- Scientific Data, Vol. 4, Issue 1
Chemical named entities recognition: a review on approaches and applications
journal, April 2014
- Eltyeb, Safaa; Salim, Naomie
- Journal of Cheminformatics, Vol. 6, Issue 1
PubChem 2019 update: improved access to chemical data
journal, October 2018
- Kim, Sunghwan; Chen, Jie; Cheng, Tiejun
- Nucleic Acids Research, Vol. 47, Issue D1
Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning
journal, October 2017
- Kim, Edward; Huang, Kevin; Saunders, Adam
- Chemistry of Materials, Vol. 29, Issue 21
SymPy: symbolic computing in Python
journal, January 2017
- Meurer, Aaron; Smith, Christopher P.; Paprocki, Mateusz
- PeerJ Computer Science, Vol. 3
Thermodynamic Routes to Novel Metastable Nitrogen-Rich Nitrides
journal, August 2017
- Sun, Wenhao; Holder, Aaron; Orvañanos, Bernardo
- Chemistry of Materials, Vol. 29, Issue 16
Machine learning for molecular and materials science
journal, July 2018
- Butler, Keith T.; Davies, Daniel W.; Cartwright, Hugh
- Nature, Vol. 559, Issue 7715
Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory
journal, June 2010
- Hautier, Geoffroy; Fischer, Christopher C.; Jain, Anubhav
- Chemistry of Materials, Vol. 22, Issue 12
Planning chemical syntheses with deep neural networks and symbolic AI
journal, March 2018
- Segler, Marwin H. S.; Preuss, Mike; Waller, Mark P.
- Nature, Vol. 555, Issue 7698