DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction

Abstract

article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. As a result, this makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery.

Authors:
 [1];  [2]
  1. Univ. of Cambridge, Cambridge (United Kingdom)
  2. Univ. of Cambridge, Cambridge (United Kingdom); STFC Rutherford Appleton Lab., Oxfordshire (United Kingdom); Argonne National Lab. (ANL), Argonne, IL (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
Engineering and Physical Sciences Research Council (EPSRC); USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22). Materials Sciences & Engineering Division
OSTI Identifier:
1460724
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Accepted Manuscript
Journal Name:
Scientific Data
Additional Journal Information:
Journal Volume: 5; Journal ID: ISSN 2052-4463
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
36 MATERIALS SCIENCE

Citation Formats

Court, Callum J., and Cole, Jacqueline M. Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction. United States: N. p., 2018. Web. doi:10.1038/sdata.2018.111.
Court, Callum J., & Cole, Jacqueline M. Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction. United States. https://doi.org/10.1038/sdata.2018.111
Court, Callum J., and Cole, Jacqueline M. Tue . "Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction". United States. https://doi.org/10.1038/sdata.2018.111. https://www.osti.gov/servlets/purl/1460724.
@article{osti_1460724,
title = {Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction},
author = {Court, Callum J. and Cole, Jacqueline M.},
abstractNote = {article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. As a result, this makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery.},
doi = {10.1038/sdata.2018.111},
journal = {Scientific Data},
number = ,
volume = 5,
place = {United States},
year = {Tue Jun 19 00:00:00 EDT 2018},
month = {Tue Jun 19 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 53 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Information Retrieval and Text Mining Technologies for Chemistry
journal, May 2017


MAGNDATA : towards a database of magnetic structures. II. The incommensurate case
journal, October 2016

  • Gallego, Samuel V.; Perez-Mato, J. Manuel; Elcoro, Luis
  • Journal of Applied Crystallography, Vol. 49, Issue 6
  • DOI: 10.1107/S1600576716015491

ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature
journal, October 2016

  • Swain, Matthew C.; Cole, Jacqueline M.
  • Journal of Chemical Information and Modeling, Vol. 56, Issue 10
  • DOI: 10.1021/acs.jcim.6b00207

Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics
journal, January 2011

  • Olivares-Amaya, Roberto; Amador-Bedolla, Carlos; Hachmann, Johannes
  • Energy & Environmental Science, Vol. 4, Issue 12
  • DOI: 10.1039/c1ee02056k

A Statistical Approach to Mechanized Encoding and Searching of Literary Information
journal, October 1957

  • Luhn, H. P.
  • IBM Journal of Research and Development, Vol. 1, Issue 4
  • DOI: 10.1147/rd.14.0309

MAGNDATA : towards a database of magnetic structures. I. The commensurate case
journal, September 2016

  • Gallego, Samuel V.; Perez-Mato, J. Manuel; Elcoro, Luis
  • Journal of Applied Crystallography, Vol. 49, Issue 5
  • DOI: 10.1107/S1600576716012863

ChemicalTagger: A tool for semantic text-mining in chemistry
journal, May 2011

  • Hawizy, Lezan; Jessop, David M.; Adams, Nico
  • Journal of Cheminformatics, Vol. 3, Issue 1
  • DOI: 10.1186/1758-2946-3-17

Chemical named entities recognition: a review on approaches and applications
journal, April 2014


Machine-learned and codified synthesis parameters of oxide materials
journal, September 2017


Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery
journal, September 2015

  • Pyzer-Knapp, Edward O.; Li, Kewei; Aspuru-Guzik, Alan
  • Advanced Functional Materials, Vol. 25, Issue 41
  • DOI: 10.1002/adfm.201501919

Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013

  • Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
  • APL Materials, Vol. 1, Issue 1
  • DOI: 10.1063/1.4812323

Snowball: extracting relations from large plain-text collections
conference, January 2000

  • Agichtein, Eugene; Gravano, Luis
  • Proceedings of the fifth ACM conference on Digital libraries - DL '00
  • DOI: 10.1145/336597.336644

TextRunner: open information extraction on the web
conference, January 2007

  • Yates, Alexander; Cafarella, Michael; Banko, Michele
  • Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations on XX - NAACL '07
  • DOI: 10.3115/1614164.1614177

Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction [Supplemental Data]
dataset, June 2018

  • Court, Callum James; Cole, Jacqueline M.
  • figshare-Supplementary information for journal article at DOI: 10.1038/sdata.2018.111, 1 ZIP file (7.18 MB)
  • DOI: 10.6084/m9.figshare.c.3954418

ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature.
text, January 2016

  • Swain, Matthew C.; Cole, Jacqui
  • Apollo - University of Cambridge Repository
  • DOI: 10.17863/cam.10935

Works referencing / citing this record:

A reference set of curated biomedical data and metadata from clinical case reports
journal, November 2018

  • Caufield, J. Harry; Zhou, Yijiang; Garlid, Anders O.
  • Scientific Data, Vol. 5, Issue 1
  • DOI: 10.1038/sdata.2018.258

A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials database map
journal, October 2019

  • Venugopal, Vineeth; Broderick, Scott R.; Rajan, Krishna
  • MRS Communications, Vol. 9, Issue 4
  • DOI: 10.1557/mrc.2019.136

Growing field of materials informatics: databases and artificial intelligence
journal, January 2020

  • Lopez-Bezanilla, Alejandro; Littlewood, Peter B.
  • MRS Communications, Vol. 10, Issue 1
  • DOI: 10.1557/mrc.2020.2