Scalable workflow for evaluating and optimizing large language models
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
This work describes the improved workflow for evaluating open-source large language models (LLMs) for trustworthiness. The workflow facilitates the acquisition of LLMs, the generation of LLM responses, and the evaluation of the responses for their trustworthiness. As a use case, the workflow is employed to evaluate dense, quantized, and pruned Meta Llama3.1 LLMs for their truthfulness. The outcome of the project could set the stage for understanding and developing trustworthy models in the future projects.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Science (SC)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 3002371
- Report Number(s):
- ORNL-TM--2025-3935
- Country of Publication:
- United States
- Language:
- English
Similar Records
Automatic building energy model development and debugging using large language models agentic workflow
Optimizing Geospatial Assessments for Nuclear Safeguards Applications with Large Language Models
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
Journal Article
·
Tue Nov 25 19:00:00 EST 2025
· Energy and Buildings
·
OSTI ID:2480816
Optimizing Geospatial Assessments for Nuclear Safeguards Applications with Large Language Models
Technical Report
·
Tue Sep 30 00:00:00 EDT 2025
·
OSTI ID:3000215
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
Conference
·
Fri Nov 14 23:00:00 EST 2025
·
OSTI ID:3010792