Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction
- Univ. of Cambridge, Cambridge (United Kingdom)
- Univ. of Cambridge, Cambridge (United Kingdom); STFC Rutherford Appleton Lab., Oxfordshire (United Kingdom); Argonne National Lab. (ANL), Argonne, IL (United States)
article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. As a result, this makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery.
- Research Organization:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- Engineering and Physical Sciences Research Council (EPSRC); USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22). Materials Sciences & Engineering Division
- Grant/Contract Number:
- AC02-06CH11357
- OSTI ID:
- 1460724
- Journal Information:
- Scientific Data, Vol. 5; ISSN 2052-4463
- Publisher:
- Nature Publishing GroupCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
A reference set of curated biomedical data and metadata from clinical case reports
|
journal | November 2018 |
A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials database map
|
journal | October 2019 |
Growing field of materials informatics: databases and artificial intelligence
|
journal | January 2020 |
Similar Records
PDFDataExtractor: A Tool for Reading Scientific Text and Interpreting Metadata from the Typeset Literature in the Portable Document Format
A database of refractive indices and dielectric constants auto-generated using ChemDataExtractor