Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models

Conference ·
OSTI ID:3010792

Mixture of Experts (MoE) models have enabled the scaling of Large Language Models (LLMs) and Vision Language Models (VLMs) by achieving massive parameter counts while maintaining computational efficiency. However, MoEs introduce several inference-time challenges, including load imbalance across experts and the additional routing computational overhead. To address these challenges and fully harness the benefits of MoE, a systematic evaluation of hardware acceleration techniques is essential. We present MoE-Inference-Bench, a comprehensive study to evaluate MoE performance across diverse scenarios. We analyze the impact of batch size, sequence length, and critical MoE hyperparameters such as FFN dimensions and number of experts on throughput. We evaluate several optimization techniques on Nvidia H100 GPUs, including pruning, Fused MoE operations, speculative decoding, quantization, and various parallelization strategies. Our evaluation includes MoEs from the Mixtral, DeepSeek, OLMoE and Qwen families. The results reveal performance differences across configurations and provide insights for the efficient deployment of MoEs.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
3010792
Report Number(s):
PNNL-SA-217510
Resource Relation:
16th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems
Country of Publication:
United States
Language:
English

Similar Records

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
Conference · Tue Dec 31 23:00:00 EST 2024 · OSTI ID:2563712

Scalable workflow for evaluating and optimizing large language models
Technical Report · Sat May 31 20:00:00 EDT 2025 · OSTI ID:3002371

Fuzzy relational inference language for expert systems
Book · Fri Dec 31 23:00:00 EST 1982 · OSTI ID:5421218

Related Subjects