Classifying and rating AI benchmarks
Journal Article
·
· No journal information
OSTI ID:3019254
- Fermilab
We created a set of standards to efficiently evaluate AI benchmarks through objective means. Although prevalent, especially in recent times, AI benchmarks have no single way to measure their effectiveness. The MLCommons team provided a set of criteria for evaluating benchmarks, although the criteria lacks a clearly defined set of evaluation rules. We created a rubric with preset factors to efficiently and objectively evaluate a benchmark’s quality. We created a software framework for processing lists of benchmarks for visualization. The framework and rating system allows researchers to quickly check if their benchmarks are effective.
- Research Organization:
- Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
- Sponsoring Organization:
- US Department of Energy
- DOE Contract Number:
- 89243024CSC000002
- OSTI ID:
- 3019254
- Report Number(s):
- FERMILAB-PUB-25-0478-STUDENT; oai:inspirehep.net:2963459
- Journal Information:
- No journal information, Journal Name: No journal information
- Country of Publication:
- United States
- Language:
- English
Similar Records
Ranking and Classifying AI Benchmarks
An MLCommons Scientific Benchmarks Ontology
AI Benchmark Democratization and Carpentry
Conference
·
Tue Aug 05 20:00:00 EDT 2025
· No journal information
·
OSTI ID:3019412
An MLCommons Scientific Benchmarks Ontology
Journal Article
·
Wed Nov 05 23:00:00 EST 2025
· No journal information
·
OSTI ID:3004873
AI Benchmark Democratization and Carpentry
Journal Article
·
Thu Dec 11 23:00:00 EST 2025
· No journal information
·
OSTI ID:3008660