DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Protein–Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools

Journal Article · · Journal of Proteome Research

The study of protein-protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. For example, the study of PPIs involved in the human:SARS-CoV-2 viral infection mechanism aided in the development of the SARS-CoV-2 vaccines. Though several databases exist for the manual curation of PPI networks, text mining methods have been routinely demonstrated as useful alternatives for newly studied or understudied species where databases are incomplete. Here, the relationship extraction (RE) performance of several open-source classical text processing, machine learning (ML)-based natural language processing (NLP), and large language model (LLM)-based NLP tools were compared. Overall, our results indicated that networks derived from classical methods tend to have high true positive rates at the expense of having overconnected-networks, ML-based NLP methods have lower true positive rates but networks with the closest structures to the target network, and LLM-based NLP methods tend to exist in-between the two other approaches, with variable performances. Finally, the selection of a specific NLP approach should be tied to the needs of a study and text availability, as models varied in performance due to the amount of text provided.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Laboratory Directed Research and Development (LDRD) Program
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
2483344
Report Number(s):
PNNL-SA--199871
Journal Information:
Journal of Proteome Research, Journal Name: Journal of Proteome Research Journal Issue: 12 Vol. 23; ISSN 1535-3893
Publisher:
American Chemical Society (ACS)Copyright Statement
Country of Publication:
United States
Language:
English

References (32)

Amplification of the synovial inflammatory response through activation of mitogen‐activated protein kinases and nuclear factor κB using ligation of CD40 on CD14+ synovial cells from patients with rheumatoid arthritis journal July 2004
pubmed.mineR: An R package with text-mining algorithms to analyse PubMed abstracts journal September 2015
Multiple kernel learning in protein–protein interaction extraction from biomedical literature journal March 2011
A four-strain probiotic exerts positive immunomodulatory effects by enhancing colonic butyrate production in vitro journal January 2019
UniprotR: Retrieving and visualizing protein sequence and functional information from Universal Protein Resource (UniProt knowledgebase) journal February 2020
Co-expression of Cox-2 and EGFR in stage I human bronchial adenocarcinomas journal August 2004
Data science opportunities of large language models for neuroscience and biomedicine journal March 2024
A First Course in Network Science journal January 2020
ER-60 Domains Responsible for Interaction with Calnexin and Calreticulin journal June 2004
Collective dynamics of ‘small-world’ networks journal June 1998
A comprehensive SARS-CoV-2–human protein–protein interactome reveals COVID-19 pathobiology and potential host therapeutic targets journal October 2022
BioRED: a rich biomedical relation extraction dataset journal July 2022
BioGPT: generative pre-trained transformer for biomedical text generation and mining journal September 2022
BioBERT: a pre-trained biomedical language representation model for biomedical text mining journal September 2019
Large-scale automated machine reading discovers new cancer-driving mechanisms journal January 2018
NEW EMBO MEMBER'S REVIEW: Diversity of protein-protein interactions journal July 2003
The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets journal November 2020
The IntAct database: efficient access to fine-grained molecular interaction data journal November 2021
UniProt: the Universal Protein Knowledgebase in 2023 journal November 2022
Intensity and coherence of motifs in weighted complex networks journal June 2005
Clustering in complex directed networks journal August 2007
SimpleTrPPI: A simple method for transferring knowledge between interaction networks for PPI prediction conference December 2010
PPIExtractor: A Protein Interaction Extraction and Visualization System for Biomedical Literature journal September 2013
A distance measure between attributed relational graphs for pattern recognition journal May 1983
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing journal October 2021
Angiotensin II induces carbon monoxide production in the perfused kidney: relationship to protein kinase C activation journal November 2004
Expression Patterns of Cyclins D1, E and Cyclin-Dependent Kinase Inhibitors p21(Waf1/Cip1) and p27(Kip1) in Urothelial Carcinoma: Correlation with Other Cell-Cycle-Related Proteins (Rb, p53, Ki-67 and PCNA) and Clinicopathological Features journal January 2004
ROC Solid: Receiver Operator Characteristic (ROC) Curves as a Foundation for Better Diagnostic Tests journal May 2018
Content-rich biological network constructed by mining PubMed abstracts journal October 2004
Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction journal April 2022
BertSRC: transformer-based semantic relation classification journal September 2022
Automated assembly of molecular mechanisms at scale from text mining and curated databases journal March 2023