skip to main content

DOE PAGESDOE PAGES

Title: Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction

article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. As a result, this makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery.
Authors:
 [1] ;  [2]
  1. Univ. of Cambridge, Cambridge (United Kingdom)
  2. Univ. of Cambridge, Cambridge (United Kingdom); STFC Rutherford Appleton Lab., Oxfordshire (United Kingdom); Argonne National Lab. (ANL), Argonne, IL (United States)
Publication Date:
Grant/Contract Number:
AC02-06CH11357
Type:
Accepted Manuscript
Journal Name:
Scientific Data
Additional Journal Information:
Journal Volume: 5; Journal ID: ISSN 2052-4463
Publisher:
Nature Publishing Group
Research Org:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org:
Engineering and Physical Sciences Research Council (EPSRC); USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22). Materials Sciences & Engineering Division
Country of Publication:
United States
Language:
English
Subject:
36 MATERIALS SCIENCE
OSTI Identifier:
1460724