Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Domain-specific text embedding model for accelerator physics

Journal Article · · Physical Review Accelerators and Beams

Accelerator physics presents unique challenges for natural language processing (NLP) due to its specialized terminology and complex concepts. A key component in overcoming these challenges is the development of robust text embedding models that transform textual data into dense vector representations, facilitating efficient information retrieval and semantic understanding. In this work, we introduce AccPhysBERT, a sentence embedding model fine-tuned specifically for accelerator physics. Our model demonstrates superior performance across a range of downstream NLP tasks, surpassing existing models in capturing the domain-specific nuances of the field. We further showcase its practical applications, including semantic paper-reviewer matching and integration into retrieval-augmented generation systems, highlighting its potential to enhance information retrieval and knowledge discovery in accelerator physics.

Published by the American Physical Society 2025
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
US Department of Energy; USDOE; USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22), Scientific User Facilities Division (SC-22.3 )
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
2556923
Alternate ID(s):
OSTI ID: 2570163
Journal Information:
Physical Review Accelerators and Beams, Journal Name: Physical Review Accelerators and Beams Journal Issue: 4 Vol. 28; ISSN 2469-9888; ISSN PRABCJ
Publisher:
American Physical SocietyCopyright Statement
Country of Publication:
United States
Language:
English

References (17)

The Elements of Statistical Learning book January 2009
Introduction to Information Retrieval book January 2008
MatSciBERT: A materials domain language model for text mining and information extraction journal May 2022
PhysBERT: A text embedding model for physics scientific literature journal October 2024
BioBERT: a pre-trained biomedical language representation model for biomedical text mining journal September 2019
Beam response to rf-generator noise in the presence of higher-harmonic passive cavities journal May 2022
Lattice correction and commissioning simulation of the Advanced Light Source upgrade storage ring journal November 2022
Application of deep learning methods for beam size control during user operation at the Advanced Light Source journal July 2024
Expertise modeling for matching papers with reviewers conference August 2007
Survey of Hallucination in Natural Language Generation journal March 2023
European XFEL Injector Commissioning Results text January 2018
Operational and Beam Study Results of Measurements with the Transverse Feedback System at the Canadian Light Source conferencepaper January 2022
LLRF Control and Synchronization System of the ARES Facility text January 2021
Parallel beam-based alignment for the EBS storage ring conferencepaper January 2024
Status of the Advanced Light Source conferencepaper January 2024
Deep Contextualized Word Representations
  • Peters, Matthew; Neumann, Mark; Iyyer, Mohit
  • Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) https://doi.org/10.18653/v1/N18-1202
conference January 2018
Universal Language Model Fine-tuning for Text Classification
  • Howard, Jeremy; Ruder, Sebastian
  • Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) https://doi.org/10.18653/v1/P18-1031
conference January 2018

Similar Records

Field space geometry and nonlinear supersymmetry
Journal Article · Wed May 07 20:00:00 EDT 2025 · Physical Review. D. · OSTI ID:2565039

PhysBERT: A text embedding model for physics scientific literature
Journal Article · Mon Oct 28 20:00:00 EDT 2024 · APL Machine Learning · OSTI ID:2564799

Related Subjects