DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An evaluation of GPT models for phenotype concept recognition

Journal Article · · BMC Medical Informatics and Decision Making (Online)
 [1];  [2];  [3];  [4];  [5];  [6];  [2];  [2]
  1. Perth Children’s Hospital (Australia); Telethon Kids Institute (Australia); Curtin Univ., Perth, WA (Australia); SingHealth Duke-NUS Institute of Precision Medicine (Singapore)
  2. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
  3. King Edward Memorial Hospital (Australia)
  4. Perth Children’s Hospital (Australia); Telethon Kids Institute (Australia); King Edward Memorial Hospital (Australia); University of Western Australia (Australia)
  5. University of Colorado, Aurora, CO (United States)
  6. Jackson Laboratory for Genomic Medicine, Farmington, CT (United States); University of Connecticut, Farmington, CT (United States)

Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
2470704
Journal Information:
BMC Medical Informatics and Decision Making (Online), Journal Name: BMC Medical Informatics and Decision Making (Online) Journal Issue: 1 Vol. 24; ISSN 1472-6947
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (21)

The Human Phenotype Ontology: A Tool for Annotating and Analyzing Human Hereditary Disease journal November 2008
A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease journal September 2016
Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes journal July 2018
Undiagnosed Diseases Network International (UDNI): White paper for global actions to meet patient needs journal December 2015
ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis journal July 2019
Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases journal July 2018
Self-supervised learning in medicine and healthcare journal August 2022
Foundation models for generalist medical artificial intelligence journal April 2023
The GA4GH Phenopacket schema defines a computable representation of clinical data journal June 2022
Large language models in medicine journal July 2023
100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care — Preliminary Report journal November 2021
BioGPT: generative pre-trained transformer for biomedical text generation and mining journal September 2022
PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology journal January 2021
BioBERT: a pre-trained biomedical language representation model for biomedical text mining journal September 2019
Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora journal January 2015
Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources journal November 2018
Doc2Hpo: a web application for efficient and accurate HPO concept curation journal May 2019
The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species journal November 2019
Is GPT-3 a Good Data Annotator? conference January 2023
Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning journal January 2019
Is GPT-3 a Good Data Annotator? audiovisual January 2022