skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction

Journal Article · · Scientific Data
 [1];  [2]
  1. Univ. of Cambridge, Cambridge (United Kingdom)
  2. Univ. of Cambridge, Cambridge (United Kingdom); STFC Rutherford Appleton Lab., Oxfordshire (United Kingdom); Argonne National Lab. (ANL), Argonne, IL (United States)

article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. As a result, this makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery.

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
Engineering and Physical Sciences Research Council (EPSRC); USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22). Materials Sciences & Engineering Division
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
1460724
Journal Information:
Scientific Data, Vol. 5; ISSN 2052-4463
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 53 works
Citation information provided by
Web of Science

References (15)

Information Retrieval and Text Mining Technologies for Chemistry journal May 2017
MAGNDATA : towards a database of magnetic structures. II. The incommensurate case journal October 2016
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature journal October 2016
Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics journal January 2011
A Statistical Approach to Mechanized Encoding and Searching of Literary Information journal October 1957
MAGNDATA : towards a database of magnetic structures. I. The commensurate case journal September 2016
ChemicalTagger: A tool for semantic text-mining in chemistry journal May 2011
Chemical named entities recognition: a review on approaches and applications journal April 2014
Machine-learned and codified synthesis parameters of oxide materials journal September 2017
Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery journal September 2015
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation journal July 2013
Snowball: extracting relations from large plain-text collections conference January 2000
TextRunner: open information extraction on the web
  • Yates, Alexander; Cafarella, Michael; Banko, Michele
  • Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations on XX - NAACL '07 https://doi.org/10.3115/1614164.1614177
conference January 2007
Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction [Supplemental Data] dataset June 2018
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature. text January 2016

Cited By (3)

A reference set of curated biomedical data and metadata from clinical case reports journal November 2018
A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials database map journal October 2019
Growing field of materials informatics: databases and artificial intelligence journal January 2020


Similar Records

Related Subjects