DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: CaXML: Chemistry‐informed machine learning explains mutual changes between protein conformations and calcium ions in calcium‐binding proteins using structural and topological features

Journal Article · · Protein Science
DOI: https://doi.org/10.1002/pro.70023 · OSTI ID:2526238
ORCiD logo [1];  [2];  [3];  [4]; ORCiD logo [5];  [6]
  1. Houston Methodist Research Institutes, TX (United States)
  2. Univ. of Washington, Seattle, WA (United States)
  3. Univ. of Houston, TX (United States); Holon Institute of Technology (HIT) (Israel)
  4. Univ. of Houston, TX (United States)
  5. Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA (United States)
  6. Univ. of Washington, Seattle, WA (United States); Pacific Northwest National Laboratory (PNNL), Richland, WA (United States). Environmental Molecular Sciences Laboratory (EMSL)

Proteins' flexibility is a feature in communicating changes in cell signaling instigated by binding with secondary messengers, such as calcium ions, associated with the coordination of muscle contraction, neurotransmitter release, and gene expression. When binding with the disordered parts of a protein, calcium ions must balance their charge states with the shape of calcium-binding proteins and their versatile pool of partners depending on the circumstances they transmit. Accurately determining the ionic charges of those ions is essential for understanding their role in such processes. However, it is unclear whether the limited experimental data available can be effectively used to train models to accurately predict the charges of calcium-binding protein variants. Here, we developed a chemistry-informed, machine-learning algorithm that implements a game theoretic approach to explain the output of a machine-learning model without the prerequisite of an excessively large database for high-performance prediction of atomic charges. We used the ab initio electronic structure data representing calcium ions and the structures of the disordered segments of calcium-binding peptides with surrounding water molecules to train several explainable models. Network theory was used to extract the topological features of atomic interactions in the structurally complex data dictated by the coordination chemistry of a calcium ion, a potent indicator of its charge state in protein. Our design created a computational tool of CaXML, which provided a framework of explainable machine learning model to annotate ionic charges of calcium ions in calcium-binding proteins in response to the chemical changes in an environment. Our framework will provide new insights into protein design for engineering functionality based on the limited size of scientific data in a genome space.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
National Institutes of Health (NIH); National Science Foundation (NSF); USDOE
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
2526238
Report Number(s):
PNNL-SA--199576
Journal Information:
Protein Science, Journal Name: Protein Science Journal Issue: 2 Vol. 34; ISSN 0961-8368
Publisher:
Wiley -- The Protein SocietyCopyright Statement
Country of Publication:
United States
Language:
English

References (46)

Analysis and prediction of calcium‐binding pockets from apo‐protein structures exhibiting calcium‐induced localized conformational changes journal May 2010
Towards predicting Ca2+‐binding sites with different coordination numbers in proteins with atomic resolution journal November 2008
Predicting Ca2+‐binding sites using refined carbon clusters journal July 2012
Statistical analysis of structural characteristics of protein Ca2+-binding sites journal July 2008
Molecular Mechanisms of Calcium and Magnesium Binding to Parvalbumin journal March 2002
EF-hand calcium-binding proteins journal December 2000
Explaining individual predictions when features are dependent: More accurate approximations to Shapley values journal September 2021
Calcium Signaling journal December 2007
Chemistry-informed machine learning prediction of compressive strength for alkali-activated materials journal January 2022
Machine learning applications in cancer prognosis and prediction journal January 2015
Machine Learning for Chemistry: Basics and Applications journal August 2023
Uncovering Non-random Binary Patterns Within Sequences of Intrinsically Disordered Proteins journal January 2022
CaM Kinase: Still Inspiring at 40 journal August 2019
Experiment and Simulation Reveal Residue Details for How Target Binding Tunes Calmodulin’s Calcium-Binding Properties journal March 2023
Initiation of Medial Calcification: Revisiting Calcium Ion Binding to Elastin journal September 2024
Advancing Physical Chemistry with Machine Learning journal November 2020
Chemistry-Informed Machine Learning for Polymer Electrolyte Discovery journal January 2023
Binding Energy and Free Energy of Calcium Ion to Calmodulin EF-Hands with the Drude Polarizable Force Field journal December 2021
Calcium(II) site specificity: effect of size and charge on metal ion binding to an EF-hand-like site journal April 1990
The versatility and universality of calcium signalling journal October 2000
Calcium signalling: dynamics, homeostasis and remodelling journal July 2003
A general-purpose machine-learning force field for bulk and nanostructured phosphorus journal October 2020
Explaining a series of models by propagating Shapley values journal August 2022
Structural titration reveals Ca2+-dependent conformational landscape of the IP3 receptor journal October 2023
Applications of machine learning in drug discovery and development journal April 2019
Calcium-gated potassium channel blockade via membrane-facing fenestrations journal August 2023
From local explanations to global understanding with explainable AI for trees journal January 2020
Leveraging large language models for predictive chemistry journal February 2024
Machine-guided path sampling to discover mechanisms of molecular self-organization journal April 2023
Calcium ions in aqueous solutions: Accurate force field description aided by ab initio molecular dynamics and neutron scattering journal June 2018
Determining the atomic charge of calcium ion requires the information of its coordination geometry in an EF-hand motif journal March 2021
The topology of data journal January 2023
Calcium signaling: A tale for all seasons journal February 2002
Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues journal July 2013
Competing interactions give rise to two-state behavior and switch-like transitions in charge-rich intrinsically disordered proteins journal May 2022
Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning journal October 2022
Machine learning to predict continuous protein properties from binary cell sorting data and map unseen sequence space journal March 2024
Why Calcium? How Calcium Became the Best Communicator journal July 2016
Cytosolic Ca2+ Buffers journal October 2010
Fundamentals of Cellular Calcium Signaling: A Primer journal August 2019
Calcium binding proteins. Elucidating the contributions to calcium affinity from an analysis of species variants and peptide fragments journal March 1990
XGBoost: A Scalable Tree Boosting System conference January 2016
Machine Learning for Molecular Simulation journal April 2020
A Value for n-Person Games book December 1953
Gephi: An Open Source Software for Exploring and Manipulating Networks journal March 2009
Coarse-Grained Modeling and Molecular Dynamics Simulations of Ca2+-Calmodulin journal August 2021