Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction
Abstract
article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. As a result, this makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery.
- Authors:
-
- Univ. of Cambridge, Cambridge (United Kingdom)
- Univ. of Cambridge, Cambridge (United Kingdom); STFC Rutherford Appleton Lab., Oxfordshire (United Kingdom); Argonne National Lab. (ANL), Argonne, IL (United States)
- Publication Date:
- Research Org.:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- Engineering and Physical Sciences Research Council (EPSRC); USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22). Materials Sciences & Engineering Division
- OSTI Identifier:
- 1460724
- Grant/Contract Number:
- AC02-06CH11357
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Scientific Data
- Additional Journal Information:
- Journal Volume: 5; Journal ID: ISSN 2052-4463
- Publisher:
- Nature Publishing Group
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 36 MATERIALS SCIENCE
Citation Formats
Court, Callum J., and Cole, Jacqueline M. Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction. United States: N. p., 2018.
Web. doi:10.1038/sdata.2018.111.
Court, Callum J., & Cole, Jacqueline M. Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction. United States. https://doi.org/10.1038/sdata.2018.111
Court, Callum J., and Cole, Jacqueline M. Tue .
"Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction". United States. https://doi.org/10.1038/sdata.2018.111. https://www.osti.gov/servlets/purl/1460724.
@article{osti_1460724,
title = {Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction},
author = {Court, Callum J. and Cole, Jacqueline M.},
abstractNote = {article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. As a result, this makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery.},
doi = {10.1038/sdata.2018.111},
journal = {Scientific Data},
number = ,
volume = 5,
place = {United States},
year = {Tue Jun 19 00:00:00 EDT 2018},
month = {Tue Jun 19 00:00:00 EDT 2018}
}
Web of Science
Works referenced in this record:
Information Retrieval and Text Mining Technologies for Chemistry
journal, May 2017
- Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália
- Chemical Reviews, Vol. 117, Issue 12
MAGNDATA : towards a database of magnetic structures. II. The incommensurate case
journal, October 2016
- Gallego, Samuel V.; Perez-Mato, J. Manuel; Elcoro, Luis
- Journal of Applied Crystallography, Vol. 49, Issue 6
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
journal, October 2016
- Swain, Matthew C.; Cole, Jacqueline M.
- Journal of Chemical Information and Modeling, Vol. 56, Issue 10
Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics
journal, January 2011
- Olivares-Amaya, Roberto; Amador-Bedolla, Carlos; Hachmann, Johannes
- Energy & Environmental Science, Vol. 4, Issue 12
A Statistical Approach to Mechanized Encoding and Searching of Literary Information
journal, October 1957
- Luhn, H. P.
- IBM Journal of Research and Development, Vol. 1, Issue 4
MAGNDATA : towards a database of magnetic structures. I. The commensurate case
journal, September 2016
- Gallego, Samuel V.; Perez-Mato, J. Manuel; Elcoro, Luis
- Journal of Applied Crystallography, Vol. 49, Issue 5
ChemicalTagger: A tool for semantic text-mining in chemistry
journal, May 2011
- Hawizy, Lezan; Jessop, David M.; Adams, Nico
- Journal of Cheminformatics, Vol. 3, Issue 1
Chemical named entities recognition: a review on approaches and applications
journal, April 2014
- Eltyeb, Safaa; Salim, Naomie
- Journal of Cheminformatics, Vol. 6, Issue 1
Machine-learned and codified synthesis parameters of oxide materials
journal, September 2017
- Kim, Edward; Huang, Kevin; Tomala, Alex
- Scientific Data, Vol. 4, Issue 1
Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery
journal, September 2015
- Pyzer-Knapp, Edward O.; Li, Kewei; Aspuru-Guzik, Alan
- Advanced Functional Materials, Vol. 25, Issue 41
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013
- Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
- APL Materials, Vol. 1, Issue 1
Snowball: extracting relations from large plain-text collections
conference, January 2000
- Agichtein, Eugene; Gravano, Luis
- Proceedings of the fifth ACM conference on Digital libraries - DL '00
TextRunner: open information extraction on the web
conference, January 2007
- Yates, Alexander; Cafarella, Michael; Banko, Michele
- Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations on XX - NAACL '07
Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction [Supplemental Data]
dataset, June 2018
- Court, Callum James; Cole, Jacqueline M.
- figshare-Supplementary information for journal article at DOI: 10.1038/sdata.2018.111, 1 ZIP file (7.18 MB)
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature.
text, January 2016
- Swain, Matthew C.; Cole, Jacqui
- Apollo - University of Cambridge Repository
Works referencing / citing this record:
A reference set of curated biomedical data and metadata from clinical case reports
journal, November 2018
- Caufield, J. Harry; Zhou, Yijiang; Garlid, Anders O.
- Scientific Data, Vol. 5, Issue 1
A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials database map
journal, October 2019
- Venugopal, Vineeth; Broderick, Scott R.; Rajan, Krishna
- MRS Communications, Vol. 9, Issue 4
Growing field of materials informatics: databases and artificial intelligence
journal, January 2020
- Lopez-Bezanilla, Alejandro; Littlewood, Peter B.
- MRS Communications, Vol. 10, Issue 1