skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning

Journal Article · · Bioinformatics
ORCiD logo; ORCiD logo; ORCiD logo; ORCiD logo; ORCiD logo; ORCiD logo; ORCiD logo; ORCiD logo; ORCiD logo; ORCiD logo; ORCiD logo; ORCiD logo;

Abstract Motivation Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. Results Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM’s native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. Availability and implementation SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.

Sponsoring Organization:
USDOE
Grant/Contract Number:
AC0205CH11231
OSTI ID:
2322444
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Vol. 40 Journal Issue: 3; ISSN 1367-4803
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (26)

DrugBank 5.0: a major update to the DrugBank database for 2018 journal November 2017
What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models journal December 2020
Risk of transient hyperammonemic encephalopathy in cancer patients who received continuous infusion of 5-fluorouracil with the complication of dehydration and infection journal March 1999
LLMs4OL: Large Language Models for Ontology Learning book October 2023
A Review of SHACL: From Data Validation to Schema Reasoning for RDF Graphs book February 2022
AgroPortal: A vocabulary and ontology repository for agronomy journal January 2018
DBpedia - A crystallization point for the Web of Data journal September 2009
ROBOT: A Tool for Automating Ontology Workflows journal July 2019
The Reactome Pathway Knowledgebase journal November 2017
FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration journal December 2018
Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science journal June 2022
The Medical Dictionary for Regulatory Activities (MedDRA) journal January 1999
Progress toward a universal biomedical data translator journal June 2022
OGER++: hybrid multi-type entity recognition journal January 2019
Unifying the identification of biomedical entities with the Bioregistry journal November 2022
Dead simple OWL design patterns journal June 2017
Gilda: biomedical entity text normalization with machine-learned disambiguation as a service journal January 2022
BioGPT: generative pre-trained transformer for biomedical text generation and mining journal September 2022
Units of Measure in Clinical Information Systems journal March 1999
BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications journal June 2011
ChEBI in 2016: Improved services and an expanding collection of metabolites journal October 2015
Long-Term Lithium Therapy Leading to Hyperparathyroidism: A Case Report journal January 2009
The Gene Ontology Resource: 20 years and still GOing strong journal November 2018
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview journal November 2020
BioCreative V CDR task corpus: a resource for chemical disease relation extraction journal January 2016
Will Generative Artificial Intelligence Deliver on Its Promise in Health Care? journal January 2024

Similar Records

The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species
Journal Article · Fri Nov 24 00:00:00 EST 2023 · Nucleic Acids Research · OSTI ID:2322444

A rule-free workflow for the automated generation of databases from scientific literature
Journal Article · Wed Dec 13 00:00:00 EST 2023 · npj Computational Materials · OSTI ID:2322444

Disease Ontology 2015 update: An expanded and updated database of human diseases for linking biomedical knowledge through disease data
Journal Article · Mon Oct 27 00:00:00 EDT 2014 · Nucleic Acids Research · OSTI ID:2322444

Related Subjects