Scalable workflow for evaluating and optimizing large language models

Jin, Zheming

doi:10.2172/3002371

Scalable workflow for evaluating and optimizing large language models

Technical Report · Sat May 31 20:00:00 EDT 2025

DOI:https://doi.org/10.2172/3002371· OSTI ID:3002371

Jin, Zheming ^[1]

Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

This work describes the improved workflow for evaluating open-source large language models (LLMs) for trustworthiness. The workflow facilitates the acquisition of LLMs, the generation of LLM responses, and the evaluation of the responses for their trustworthiness. As a use case, the workflow is employed to evaluate dense, quantized, and pruned Meta Llama3.1 LLMs for their truthfulness. The outcome of the project could set the stage for understanding and developing trustworthy models in the future projects.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Science (SC)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 3002371

Report Number(s):: ORNL-TM--2025-3935

Country of Publication:: United States

Language:: English

Similar Records

Automatic building energy model development and debugging using large language models agentic workflow

Journal Article · Tue Nov 25 19:00:00 EST 2025 · Energy and Buildings · OSTI ID:2480816

Optimizing Geospatial Assessments for Nuclear Safeguards Applications with Large Language Models

Technical Report · Tue Sep 30 00:00:00 EDT 2025 · OSTI ID:3000215

MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models

Conference · Fri Nov 14 23:00:00 EST 2025 · OSTI ID:3010792

Related Subjects

97 MATHEMATICS AND COMPUTING

Scalable workflow for evaluating and optimizing large language models

Citation Formats

Similar Records

Related Subjects