Prompt Phrase Ordering Using Large Language Models in HPC: Evaluating Prompt Sensitivity
- Rice Univ., Houston, TX (United States)
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Large language models (LLMs) have demonstrated effective performance in domain-specific tasks, often requiring a well-designed prompt to guide their responses. However, optimizing the right prompt is challenging due to prompt sensitivity—the phenomenon where small changes in the prompt can lead to significant variations in performance. In this study, we evaluate prompt performance by examining all permutations of independent phrases to investigate prompt sensitivity and robustness. We used two datasets: the GSM8k dataset, which assesses mathematical reasoning, and a custom template prompt for summarizing database metadata. Our goal was to evaluate the performance across all permutations of a sequence of prompt phrases. The study was conducted using the llama3-instruct- 7B model hosted on Ollama, with computations parallelized in a high-performance computing environment. By comparing the average index of phrases in the best and worst-performing prompts, we found that the order of independent phrases within a prompt significantly impacts LLM performance. Additionally, we used Hamming distance to assess changes between phrase orderings, concluding that prompt modifications can dramatically affect scores, often by almost random chance. These findings support existing research on prompt sensitivity. We discuss the challenges of prompt optimization, noting that altering phrases in a successful prompt does not always result in another successful prompt.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 2573371
- Report Number(s):
- ORNL/TM--2025/3753; PUB-ID-229735
- Country of Publication:
- United States
- Language:
- English
Similar Records
Leveraging Large Language Models to Automate the Identification of Healthcare Access Barriers for Veterans