DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MaTableGPT: GPT‐Based Table Data Extractor from Materials Science Literature

Journal Article · · Advanced Science
ORCiD logo [1];  [1];  [2];  [3];  [2];  [2];  [2];  [4];  [5];  [6]; ORCiD logo [2]; ORCiD logo [2]
  1. Computational Science Research Center Korea Institute of Science and Technology Seoul 02792 Republic of Korea, Department of Materials Science and Engineering Korea University Seoul 02841 Republic of Korea
  2. Computational Science Research Center Korea Institute of Science and Technology Seoul 02792 Republic of Korea
  3. Global Security Computing Applications Division Lawrence Livermore National Laboratory Livermore CA 94550 USA
  4. Department of Materials Science and Engineering Korea University Seoul 02841 Republic of Korea
  5. Center for Applied Scientific Computing Lawrence Livermore National Laboratory Livermore CA 94550 USA
  6. Materials Science Division Lawrence Livermore National Laboratory Livermore CA 94550 USA

Abstract Efficiently extracting data from tables in the scientific literature is pivotal for building large‐scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule‐based extractions are an ineffective approach. To overcome this challenge, the study presents MaTableGPT, which is a GPT‐based table data extractor from the materials science literature. MaTableGPT features key strategies of table data representation and table splitting for better GPT comprehension and filtering hallucinated information through follow‐up questions. When applied to a vast volume of water splitting catalysis literature, MaTableGPT achieves an extraction accuracy (total F1 score) of up to 96.8%. Through comprehensive evaluations of the GPT usage cost, labeling cost, and extraction accuracy for the learning methods of zero‐shot, few‐shot, and fine‐tuning, the study presents a Pareto‐front mapping where the few‐shot learning method is found to be the most balanced solution owing to both its high extraction accuracy (total F1 score >95%) and low cost (GPT usage cost of 5.97 US dollars and labeling cost of 10 I/O paired examples). The statistical analyses conducted on the database generated by MaTableGPT revealed valuable insights into the distribution of the overpotential and elemental utilization across the reported catalysts in the water splitting literature.

Sponsoring Organization:
USDOE
OSTI ID:
2506940
Journal Information:
Advanced Science, Journal Name: Advanced Science Journal Issue: 16 Vol. 12; ISSN 2198-3844
Publisher:
Wiley Blackwell (John Wiley & Sons)Copyright Statement
Country of Publication:
Germany
Language:
English

References (44)

Heterostructures for Electrochemical Hydrogen Evolution Reaction: A Review journal August 2018
Oxygen Evolution Reaction in Alkaline Environment: Material Challenges and Solutions journal March 2022
Recent Development of Oxygen Evolution Electrocatalysts in Acidic Environment journal March 2021
Data‐Driven Materials Science: Status, Challenges, and Perspectives journal September 2019
Activating Inert, Nonprecious Perovskites with Iridium Dopants for Efficient Oxygen Evolution Reaction under Acidic Conditions journal June 2019
Electrochemical Water Splitting: Bridging the Gaps Between Fundamental Research and Industrial Applications journal March 2023
Carbonaceous Oxygen Evolution Reaction Catalysts: From Defect and Doping‐Induced Activity over Hybrid Compounds to Ordered Framework Structures journal May 2021
Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) journal September 2013
Pulsed laser deposition of CoFe2O4/CoO hierarchical-type nanostructured heterojuction forming a Z-scheme for efficient spatial separation of photoinduced electron-hole pairs and highly active surface area journal September 2019
Self-standing and efficient bifunctional electrocatalyst for overall water splitting under alkaline media enabled by Mo1-xCoxS2 nanosheets anchored on carbon fiber paper journal May 2019
An efficient and durable bifunctional electrocatalyst based on PdO and Co2FeO4 for HER and OER journal June 2023
Copper-doped ruthenium oxide as highly efficient electrocatalysts for the evolution of oxygen in acidic media journal February 2022
Amorphous 3D pomegranate-like NiCoFe nanoassemblies derived by bi-component cyanogel reduction for outstanding oxygen evolution reaction journal February 2021
A review on NiFe-based electrocatalysts for efficient alkaline oxygen evolution reaction journal February 2020
“The Fe Effect”: A review unveiling the critical roles of Fe in enhancing OER activity of Ni and Co based catalysts journal February 2021
Creation of a structured solar cell material dataset and performance prediction using large language models journal May 2024
A Design-to-Device Pipeline for Data-Driven Materials Discovery journal February 2020
ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science journal September 2021
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature journal October 2016
ImageDataExtractor: A Tool To Extract and Quantify Data from Microscopy Images journal November 2019
The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts journal February 2023
Engineering Cobalt Defects in Cobalt Oxide for Highly Efficient Electrocatalytic Oxygen Evolution journal March 2018
Experimental Activity Descriptors for Iridium-Based Catalysts for the Electrochemical Oxygen Evolution Reaction (OER) journal June 2019
Is There Anything Better than Pt for HER? journal March 2021
Recent Advances in Noble Metal (Pt, Ru, and Ir)-Based Electrocatalysts for Efficient Hydrogen Evolution Reaction journal December 2019
Electrocatalytic Oxygen Evolution Reaction (OER) on Ru, Ir, and Pt Catalysts: A Comparative Study of Nanoparticles and Bulk Materials journal July 2012
Redox-Active Ligand-Mediated Oxidative Addition and Reductive Elimination at Square Planar Cobalt(III): Multielectron Reactions for Cross-Coupling journal October 2010
Boosting oxygen evolution of single-atomic ruthenium through electronic coupling with cobalt-iron layered double hydroxides journal April 2019
Structured information extraction from scientific text with large language models journal February 2024
Extracting accurate materials data from research papers with conversational language models and prompt engineering journal February 2024
The formation of unsaturated IrOx in SrIrO3 by cobalt-doping for acidic oxygen evolution reaction journal April 2024
Machine learning in materials informatics: recent applications and prospects journal December 2017
Non-iridium-based electrocatalyst for durable acidic oxygen evolution reaction in proton exchange membrane water electrolysis journal October 2022
A corpus of CO2 electrocatalytic reduction process extracted from the scientific literature journal March 2023
Accelerating materials language processing with large language models journal February 2024
Recent advances in highly active nanostructured NiFe LDH catalyst for electrochemical water splitting journal January 2021
Structural transformation between rutile and spinel crystal lattices in Ru–Co binary oxide nanotubes: enhanced electron transfer kinetics for the oxygen evolution reaction journal January 2021
Deep learning of electrochemical CO2 conversion literature reveals research trends and directions journal January 2023
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation journal July 2013
Amorphization mechanism of SrIrO 3 electrocatalyst: How oxygen redox initiates ionic diffusion and structural reorganization journal January 2021
A highly active and stable IrO x /SrIrO 3 catalyst for the oxygen evolution reaction journal September 2016
Mechanism and Kinetics of HER and OER on NiFe LDH Films in an Alkaline Electrolyte journal January 2018
New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships journal April 2016
NOMAD: The FAIR concept for big data-driven materials science journal September 2018