DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Database of Stress-Strain Properties Auto-generated from the Scientific Literature using ChemDataExtractor

Journal Article · · Scientific Data

Abstract There has been an ongoing need for information-rich databases in the mechanical-engineering domain to aid in data-driven materials science. To address the lack of suitable property databases, this study employs the latest version of the chemistry-aware natural-language-processing (NLP) toolkit, ChemDataExtractor, to automatically curate a comprehensive materials database of key stress-strain properties. The database contains information about materials and their cognate properties: ultimate tensile strength, yield strength, fracture strength, Young’s modulus, and ductility values. 720,308 data records were extracted from the scientific literature and organized into machine-readable databases formats. The extracted data have an overall precision, recall and F-score of 82.03%, 92.13% and 86.79%, respectively. The resulting database has been made publicly available, aiming to facilitate data-driven research and accelerate advancements within the mechanical-engineering domain.

Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
2478567
Journal Information:
Scientific Data, Journal Name: Scientific Data Journal Issue: 1 Vol. 11; ISSN 2052-4463
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (33)

Residual Stress in Engineering Materials: A Review journal November 2021
Data‐Driven Materials Science: Status, Challenges, and Perspectives journal September 2019
Advanced lightweight materials for Automobiles: A review journal September 2022
Data-Driven Strategies for Accelerated Materials Design journal February 2021
A Design-to-Device Pipeline for Data-Driven Materials Discovery journal February 2020
ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science journal September 2021
Single Model for Organic and Inorganic Chemical Named Entity Recognition in ChemDataExtractor journal February 2022
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature journal October 2016
Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature journal July 2019
New frontiers for the materials genome initiative journal April 2019
Unsupervised word embeddings capture latent knowledge from materials science literature journal July 2019
Text-mined dataset of inorganic materials synthesis recipes journal October 2019
A database of battery materials auto-generated using ChemDataExtractor journal August 2020
Auto-generated database of semiconductor band gaps using ChemDataExtractor journal May 2022
A database of refractive indices and dielectric constants auto-generated using ChemDataExtractor journal May 2022
Auto-generating databases of Yield Strength and Grain Size using ChemDataExtractor journal June 2022
Perovskite- and Dye-Sensitized Solar-Cell Device Databases Auto-generated Using ChemDataExtractor journal June 2022
A thermoelectric materials database auto-generated from the scientific literature using ChemDataExtractor journal October 2022
Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications journal September 2023
A database of thermally activated delayed fluorescent molecules auto-generated from scientific literature with ChemDataExtractor journal January 2024
Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction journal June 2018
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation journal July 2013
Data-driven materials research enabled by natural language processing and information extraction journal December 2020
Analyzing and Integrating Dependency Parsers journal March 2011
The CHEMDNER corpus of chemicals and drugs and its annotation principles journal January 2015
Materials considerations for aerospace applications journal November 2015
SciBERT: A Pretrained Language Model for Scientific Text
  • Beltagy, Iz; Lo, Kyle; Cohan, Arman
  • Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) https://doi.org/10.18653/v1/D19-1371
conference January 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton
  • Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies https://doi.org/10.18653/v1/N19-1423
conference January 2019
Construction of the Literature Graph in Semantic Scholar
  • Ammar, Waleed; Groeneveld, Dirk; Bhagavatula, Chandra
  • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers) https://doi.org/10.18653/v1/n18-3011
conference January 2018
Data Structures for Statistical Computing in Python conference January 2010
Structural Optimization in Civil Engineering: A Literature Review journal February 2021
pandas-dev/pandas: Pandas software January 2024
A Database of Stress-Strain Properties Auto-generated from the Scientific Literature using ChemDataExtractor dataset January 2024