Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Scalable workflow for evaluating and optimizing large language models

Technical Report ·
DOI:https://doi.org/10.2172/3002371· OSTI ID:3002371
 [1]
  1. Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

This work describes the improved workflow for evaluating open-source large language models (LLMs) for trustworthiness. The workflow facilitates the acquisition of LLMs, the generation of LLM responses, and the evaluation of the responses for their trustworthiness. As a use case, the workflow is employed to evaluate dense, quantized, and pruned Meta Llama3.1 LLMs for their truthfulness. The outcome of the project could set the stage for understanding and developing trustworthy models in the future projects.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
3002371
Report Number(s):
ORNL-TM--2025-3935
Country of Publication:
United States
Language:
English

Similar Records

Automatic building energy model development and debugging using large language models agentic workflow
Journal Article · Tue Nov 25 19:00:00 EST 2025 · Energy and Buildings · OSTI ID:2480816

Optimizing Geospatial Assessments for Nuclear Safeguards Applications with Large Language Models
Technical Report · Tue Sep 30 00:00:00 EDT 2025 · OSTI ID:3000215

MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
Conference · Fri Nov 14 23:00:00 EST 2025 · OSTI ID:3010792

Related Subjects