skip to main content

Title: Bayesian model aggregation for ensemble-based estimates of protein pKa values

This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid p$$K_a$$ predictions. Structure-based p$$K_a$$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$$K_a$$ prediction, ranging from empirical statistical models to {\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$$K_a$$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$$K_a$$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.
; ; ;
Publication Date:
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Journal Article
Resource Relation:
Journal Name: Proteins. Structure, Function, and Bioinformatics, 82(3):354-363
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Org:
Country of Publication:
United States
titration; electrostatics; solvation; statistics