Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A thermoelectric materials database auto-generated from the scientific literature using ChemDataExtractor

Journal Article · · Scientific Data
 [1];  [2]
  1. Univ. of Cambridge (United Kingdom)
  2. Univ. of Cambridge (United Kingdom); Science and Technology Facilities Council (STFC), Oxford (United Kingdom). Rutherford Appleton Lab., ISIS Neutron Source
An auto-generated thermoelectric-materials database is presented, containing 22,805 data records, automatically generated from the scientific literature, spanning 10,641 unique extracted chemical names. Each record contains a chemical entity and one of the seminal thermoelectric properties: thermoelectric figure of merit, ZT; thermal conductivity, κ; Seebeck coefficient, S; electrical conductivity, σ; power factor, PF; each linked to their corresponding recorded temperature, T. The database was auto-generated using the automatic sentence-parsing capabilities of the chemistry-aware, natural language processing toolkit, ChemDataExtractor 2.0, adapted for application in the thermoelectric-materials domain, following a rule-based sentence-simplification step. Data were mined from the text of 60,843 scientific papers that were sourced from three scientific publishers: Elsevier, the Royal Society of Chemistry, and Springer. To the best of our knowledge, this is the first automatically-generated database of thermoelectric materials and their properties from existing literature. The database was evaluated to have a precision of 82.25% and has been made publicly available to facilitate the application of data science in the thermoelectric-materials domain, for analysis, design, and prediction.
Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States). Argonne Leadership Computing Facility (ALCF)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
2423219
Journal Information:
Scientific Data, Journal Name: Scientific Data Journal Issue: 1 Vol. 9; ISSN 2052-4463
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English

References (25)

A Thermoelectric Materials Database Auto-Generated from the Scientific Literature using ChemDataExtractor dataset January 2022
Nanograined Half-Heusler Semiconductors as Advanced Thermoelectrics: An Ab Initio High-Throughput Statistical Study journal September 2014
Chalcopyrite CuGaTe2: A High-Efficiency Bulk Thermoelectric Material journal June 2012
Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis journal February 2013
TE Design Lab: A virtual laboratory for thermoelectric material design journal February 2016
Excellent thermoelectric performance of BaMgSi driven by low lattice thermal conductivity: A promising thermoelectric material journal June 2020
Electronic transport properties of Fe-doped CoSb3 prepared by encapsulated induction melting journal May 2007
Thermoelectrics: From history, a window to the future journal October 2019
A review on the enhancement of figure of merit from bulk to nano-thermoelectric materials journal March 2013
A Design-to-Device Pipeline for Data-Driven Materials Discovery journal February 2020
ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science journal September 2021
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature journal October 2016
Data-Driven Review of Thermoelectric Materials: Performance and Resource Considerations journal May 2013
Ultralow thermal conductivity and high thermoelectric figure of merit in SnSe crystals journal April 2014
Unsupervised word embeddings capture latent knowledge from materials science literature journal July 2019
A database of battery materials auto-generated using ChemDataExtractor journal August 2020
Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction journal June 2018
Material descriptors for predicting thermoelectric performance journal January 2015
Microscopic origin of the extremely low thermal conductivity and outstanding thermoelectric performance of BiSbX3 (X = S, Se) revealed by first-principles study journal January 2020
Prediction of new battery materials based on ab initio computations conference January 2016
3D charge and 2D phonon transports leading to high out-of-plane ZT in n-type SnSe crystals journal May 2018
Snowball conference May 2001
CRC Handbook of Thermoelectrics book January 2017
Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction [Supplemental Data] dataset June 2018
Machine Learning book August 2021