Parameter uncertainties for imperfect surrogate models in the low-noise regime
Abstract Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, this loss ignores model form error, or misspecification, meaning parameter uncertainties are significantly underestimated and vanish in the large data limit. As misspecification is the main source of uncertainty for surrogate models of low-noise calculations, such as those arising in atomistic simulation, predictive uncertainties are systematically underestimated. We analyze the true generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show that posterior parameter distributions must cover every training point to avoid a divergence in the generalization error and design a compatible ansatz which incurs minimal overhead for linear models. The approach is demonstrated on model problems before application to thousand-dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors in terms of parameter uncertainties, allowing this important source of uncertainty to be incorporated in multi-scale computational workflows.
- Sponsoring Organization:
- USDOE
- OSTI ID:
- 2499832
- Journal Information:
- Machine Learning: Science and Technology, Journal Name: Machine Learning: Science and Technology Journal Issue: 1 Vol. 6; ISSN 2632-2153
- Publisher:
- IOP PublishingCopyright Statement
- Country of Publication:
- United Kingdom
- Language:
- English
Similar Records
Data-Driven Compositional Optimization in Misspecified Regimes
Simulation–Extrapolation for Bias Correction with Exposure Uncertainty in Radiation Risk Analysis Utilizing Grouped Data