Builtin vs. auxiliary detection of extrapolation risk.
A key assumption in supervised machine learning is that future data will be similar to historical data. This assumption is often false in real world applications, and as a result, prediction models often return predictions that are extrapolations. We compare four approaches to estimating extrapolation risk for machine learning predictions. Two builtin methods use information available from the classification model to decide if the model would be extrapolating for an input data point. The other two build auxiliary models to supplement the classification model and explicitly model extrapolation risk. Experiments with synthetic and real data sets show that the auxiliary models are more reliable risk detectors. To best safeguard against extrapolating predictions, however, we recommend combining builtin and auxiliary diagnostics.
- Research Organization:
- Sandia National Lab. (SNL-CA), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1095941
- Report Number(s):
- SAND2013-2534; 463450
- Country of Publication:
- United States
- Language:
- English
Similar Records
Process Anomaly Detection for Sparsely Labeled Events in Nuclear Power Plants
A data-centric weak supervised learning for highway traffic incident detection