Bayesian model aggregation for ensemble-based estimates of protein pKa values
This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid p$$K_a$$ predictions. Structure-based p$$K_a$$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$$K_a$$ prediction, ranging from empirical statistical models to {\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$$K_a$$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$$K_a$$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1129305
- Report Number(s):
- PNNL-SA-95333; 400412000
- Journal Information:
- Proteins. Structure, Function, and Bioinformatics, 82(3):354-363, Journal Name: Proteins. Structure, Function, and Bioinformatics, 82(3):354-363
- Country of Publication:
- United States
- Language:
- English
Similar Records
Bayesian Model Averaging for Ensemble-Based Estimates of Solvation Free Energies
Progress in the prediction of pKa values in proteins