Resimulation-based self-supervised learning for pretraining physics foundation models
- Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States); Institute for Artificial Intelligence and Fundamental Interactions, Cambridge, MA (United States)
- Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States); Institute for Artificial Intelligence and Fundamental Interactions, Cambridge, MA (United States); SLAC National Accelerator Laboratory (SLAC), Stanford, CA (United States)
- SLAC National Accelerator Laboratory (SLAC), Stanford, CA (United States)
- Imperial College, London (United Kingdom)
- Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
Self-supervised learning (SSL) is at the core of training modern large machine learning models, providing a scheme for learning powerful representations that can be used in a variety of downstream tasks. However, SSL strategies must be adapted to the type of training data and downstream tasks required. We propose resimulation-based self-supervised representation learning (RS3L), a novel simulation-based SSL strategy that employs a method of resimulation to drive data augmentation for contrastive learning in the physical sciences, particularly, in fields that rely on stochastic simulators. By intervening in the middle of the simulation process and rerunning simulation components downstream of the intervention, we generate multiple realizations of an event, thus producing a set of augmentations covering all physics-driven variations available in the simulator. Using experiments from high-energy physics, we explore how this strategy may enable the development of a foundation model; we show how RS3L pretraining enables powerful performance in downstream tasks such as discrimination of a variety of objects and uncertainty mitigation. In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies.
- Research Organization:
- SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science (BSS); National Science Foundation (NSF)
- Grant/Contract Number:
- AC02-76SF00515
- OSTI ID:
- 2575499
- Journal Information:
- Physical Review. D., Journal Name: Physical Review. D. Journal Issue: 3 Vol. 111; ISSN 2470-0010; ISSN 2470-0029
- Publisher:
- American Physical Society (APS)Copyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models
Self-supervised Representation Learning for Astronomical Images