Ranking and Classifying AI Benchmarks

Shiraishi, Reece C.; Hawks, Benjamin G.

doi:10.2172/3019412

Ranking and Classifying AI Benchmarks

Conference · Wed Aug 06 00:00:00 EDT 2025 · No journal information

DOI:https://doi.org/10.2172/3019412· OSTI ID:3019412

Shiraishi, Reece C. ^[1]; Hawks, Benjamin G. ^[2]

Cornell U.
Fermilab

We created a set of standards to efficiently evaluate AI benchmarks through objective means. Although prevalent, especially in recent times, AI benchmarks have no single way to measure their effectiveness. The MLCommons team provided a set of criteria for evaluating benchmarks, although the criteria lacks a clearly defined set of evaluation rules. We created a rubric with preset factors to efficiently and objectively evaluate a benchmark s quality. We created a software framework for processing lists of benchmarks for visualization. The framework and rating system allows researchers to quickly check if their benchmarks are effective.

Research Organization:: Cornell U.; Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)

Sponsoring Organization:: US Department of Energy

DOE Contract Number:: 89243024CSC000002

OSTI ID:: 3019412

Report Number(s):: FERMILAB-POSTER-25-0117-STUDENT; oai:inspirehep.net:2958943

Resource Type:: Conference poster

Conference Information:: Journal Name: No journal information

Country of Publication:: United States

Language:: English

Similar Records

Classifying and rating AI benchmarks

Journal Article · Sun Aug 24 20:00:00 EDT 2025 · No journal information · OSTI ID:3019254

An MLCommons Scientific Benchmarks Ontology

Journal Article · Wed Nov 05 23:00:00 EST 2025 · No journal information · OSTI ID:3004873

AI Benchmark Democratization and Carpentry

Journal Article · Thu Dec 11 23:00:00 EST 2025 · No journal information · OSTI ID:3008660

Ranking and Classifying AI Benchmarks

Citation Formats

Similar Records

Related Subjects