Domain-specific text embedding model for accelerator physics
Accelerator physics presents unique challenges for natural language processing (NLP) due to its specialized terminology and complex concepts. A key component in overcoming these challenges is the development of robust text embedding models that transform textual data into dense vector representations, facilitating efficient information retrieval and semantic understanding. In this work, we introduce AccPhysBERT, a sentence embedding model fine-tuned specifically for accelerator physics. Our model demonstrates superior performance across a range of downstream NLP tasks, surpassing existing models in capturing the domain-specific nuances of the field. We further showcase its practical applications, including semantic paper-reviewer matching and integration into retrieval-augmented generation systems, highlighting its potential to enhance information retrieval and knowledge discovery in accelerator physics.
Published by the American Physical Society 2025- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- US Department of Energy; USDOE; USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22), Scientific User Facilities Division (SC-22.3 )
- Grant/Contract Number:
- AC02-05CH11231
- OSTI ID:
- 2556923
- Alternate ID(s):
- OSTI ID: 2570163
- Journal Information:
- Physical Review Accelerators and Beams, Journal Name: Physical Review Accelerators and Beams Journal Issue: 4 Vol. 28; ISSN 2469-9888; ISSN PRABCJ
- Publisher:
- American Physical SocietyCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
PhysBERT: A text embedding model for physics scientific literature