Classifying and rating AI benchmarks

Shiraishi, Reece

Classifying and rating AI benchmarks

Journal Article · Mon Aug 25 00:00:00 EDT 2025 · No journal information

OSTI ID:3019254

Shiraishi, Reece ^[1]

Fermilab

We created a set of standards to efficiently evaluate AI benchmarks through objective means. Although prevalent, especially in recent times, AI benchmarks have no single way to measure their effectiveness. The MLCommons team provided a set of criteria for evaluating benchmarks, although the criteria lacks a clearly defined set of evaluation rules. We created a rubric with preset factors to efficiently and objectively evaluate a benchmark’s quality. We created a software framework for processing lists of benchmarks for visualization. The framework and rating system allows researchers to quickly check if their benchmarks are effective.

Research Organization:: Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)

Sponsoring Organization:: US Department of Energy

DOE Contract Number:: 89243024CSC000002

OSTI ID:: 3019254

Report Number(s):: FERMILAB-PUB-25-0478-STUDENT; oai:inspirehep.net:2963459

Journal Information:: No journal information, Journal Name: No journal information

Country of Publication:: United States

Language:: English

Similar Records

Ranking and Classifying AI Benchmarks

Conference · Tue Aug 05 20:00:00 EDT 2025 · No journal information · OSTI ID:3019412

An MLCommons Scientific Benchmarks Ontology

Journal Article · Wed Nov 05 23:00:00 EST 2025 · No journal information · OSTI ID:3004873

AI Benchmark Democratization and Carpentry

Journal Article · Thu Dec 11 23:00:00 EST 2025 · No journal information · OSTI ID:3008660

Classifying and rating AI benchmarks

Citation Formats

Similar Records

Related Subjects