What question did this study set out to answer?

This research aims to improve the efficiency of Mixture-of-Experts models by dynamically selecting the number of experts based on routing entropy.

January 20, 2026Open Access

Entropy-Guided Dynamic Expert Selection in Mixture-of-Experts Models

Key Points

This research aims to improve the efficiency of Mixture-of-Experts models by dynamically selecting the number of experts based on routing entropy.
Developed Adaptive-K routing to adjust expert selection dynamically.
Utilized routing entropy to determine the confidence level of the model.
Compared performance on multiple MoE systems including Mixtral, Qwen-MoE, and OLMoE.
Achieved 52.5% compute reduction with Mixtral 8x7B.
Attained 32.4% compute reduction with Qwen-MoE.
Realized 24.7% compute reduction with OLMoE-1B-7B.
Combined methods yielded up to 96% total compute savings through multiplicative composition.

Abstract

We present Adaptive-K routing, a method that dynamically selects the number of experts in Mixture-of-Experts (MoE) models based on routing entropy. Instead of using a fixed top-k experts per token, our approach uses fewer experts when the router is confident (low entropy) and more experts when uncertain (high entropy). Results on production MoE models:- Mixtral 8x7B: 52.5% compute reduction- Qwen-MoE: 32.4% compute reduction - OLMoE-1B-7B: 24.7% compute reduction When combined with quantization and speculative decoding, we achieve up to 96% total compute savings through multiplicative composition. Code: https://github.com/Gabrobals/sbm-efficientPyPI: pip install adaptive-k-routing

Entropy-Guided Dynamic Expert Selection in Mixture-of-Experts Models

Key Points

Abstract

Cite This Study