DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: DIPS-Plus: The enhanced database of interacting protein structures for interface prediction

Journal Article · · Scientific Data

Abstract In this work, we expand on a dataset recently introduced for protein interface prediction (PIP), the Database of Interacting Protein Structures (DIPS), to present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for machine learning of protein interfaces. While the original DIPS dataset contains only the Cartesian coordinates for atoms contained in the protein complex along with their types, DIPS-Plus contains multiple residue-level features including surface proximities, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid, providing researchers a curated feature bank for training protein interface prediction methods. We demonstrate through rigorous benchmarks that training an existing state-of-the-art (SOTA) model for PIP on DIPS-Plus yields new SOTA results, surpassing the performance of some of the latest models trained on residue-level and atom-level encodings of protein complexes to date.

Research Organization:
Donald Danforth Plant Science Center, St. Louis, MO (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
National Institutes of Health (NIH); National Science Foundation (NSF); USDOE; USDOE Advanced Research Projects Agency - Energy (ARPA-E); USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC05-00OR22725; AR0001213; SC0020400; SC0021303
OSTI ID:
1993653
Journal Information:
Scientific Data, Journal Name: Scientific Data Journal Issue: 1 Vol. 10; ISSN 2052-4463
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (37)

Reduced surface: An efficient way to compute molecular surfaces journal March 1996
An amino acid has two sides: A new 2D measure provides a different view of solvent exposure journal February 2005
PAIRpred: Partner-specific prediction of interacting residues from sequence and structure journal December 2013
DNSS2 : Improved ab initio protein secondary structure prediction using advanced deep learning architectures journal September 2020
Conservation and prediction of solvent accessibility in protein families journal November 1994
The Quality and Validation of Structures from Structural Genomics book October 2013
Characterization of Protein–Protein Interfaces journal September 2007
Updates to the Integrated Protein–Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2 journal September 2015
Network analysis and in silico prediction of protein–protein interactions with applications in drug discovery journal June 2017
Reaching for high-hanging fruit in drug discovery at protein–protein interfaces journal December 2007
Quantum-chemical insights from deep tensor neural networks journal January 2017
flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions journal July 2021
Array programming with NumPy journal September 2020
Highly accurate protein structure prediction with AlphaFold journal July 2021
Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold journal June 2019
Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning journal December 2019
Correlations between secondary structure- and protein–protein interface-mimicry: the interface mimicry hypothesis journal January 2019
Biopython: freely available Python tools for computational molecular biology and bioinformatics journal March 2009
Protein–ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment journal August 2013
PDB-wide collection of binding data: current status of the PDBbind database journal October 2014
BIPSPI: a method for the prediction of partner-specific protein–protein interfaces journal July 2018
The RCSB Protein Data Bank: redesigned web site and web services journal October 2010
HMMER web server: interactive sequence similarity searching journal May 2011
A series of PDB-related databanks for everyday needs journal October 2014
Main-chain conformational features at different conformations of the side-chains in proteins journal August 1998
Protein complex prediction with AlphaFold-Multimer preprint March 2022
Foldseek: fast and accurate protein structure search posted_content March 2023
Worldwide Protein Data Bank validation information: usage and trends journal March 2018
Deep Learning of High-Order Interactions for Protein Interface Prediction conference August 2020
Predicting protein-protein interface residues using local surface structural similarity journal March 2012
PSAIA - Protein Structure and Interaction Analyzer journal January 2008
HH-suite3 for fast remote homology detection and deep protein annotation journal September 2019
Introduction to Anaconda and Python: Installation and setup journal May 2020
Building a Framework for Predictive Science conference January 2011
DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction dataset January 2021
DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction (Supplementary Data) dataset January 2023
Replication Data for: End-to-End Learning on 3D Protein Structure for Interface Prediction dataset January 2018