What question did this study set out to answer?

The aim is to enhance motor imagery EEG classification using a novel distillation framework while minimizing latency.

March 12, 2026Open Access

Entropy-Based Dual-Teacher Distillation for Efficient Motor Imagery EEG Classification

Key Points

The aim is to enhance motor imagery EEG classification using a novel distillation framework while minimizing latency.
Proposed an entropy-based dual-teacher distillation framework.
Introduced an exponential moving average (EMA) teacher to reduce prediction noise.
Implemented a two-stage cosine annealing schedule for improved checkpoint selection.
Conducted experiments on two public benchmarks with different backbones.
Achieved average accuracy of 0.7713 on BCI Competition IV-2a, surpassing original models and ensembles.
Achieved accuracy of 0.8583 on IV-2b, improving over both original and ensemble methods.

Abstract

Motor imagery (MI) EEG classification is a key component of noninvasive brain–computer interfaces (BCIs) and often must satisfy strict latency constraints in online or edge deployments. Although ensembling can reliably improve MI decoding accuracy, its inference cost grows linearly with the number of ensemble members, making it impractical for low-latency applications. To address these issues, we propose an entropy-based dual-teacher distillation framework that transfers ensemble teacher knowledge to a single deployable backbone. From an information theoretic perspective, two failure modes are common in small and noisy MI datasets: elevated predictive entropy (noisy decisions) and large fluctuation across late training epochs (unstable convergence and unreliable checkpoint selection). Thus, we introduce an exponential moving average (EMA) teacher with entropy-gated activation as a low-pass filter in parameter space to reduce the student’s prediction noise. In addition, a two-stage cosine annealing schedule is employed to suppress late-stage oscillations and improve the robustness of final checkpoint selection. Experiments on two public MI benchmarks (BCI Competition IV-2a and IV-2b) with three representative backbones (EEGNet, ShallowConvNet, and ATCNet) under the subject dependent protocol show consistent accuracy gains over the ensemble teacher and strong distillation baselines. On IV-2a, our method achieves an average accuracy of 0.7713 across the backbones, surpassing both the original models (0.7222) and the corresponding ensembles (0.7482); on IV-2b, it achieves 0.8583 versus 0.8432 (original) and 0.8529 (ensemble).

Entropy-Based Dual-Teacher Distillation for Efficient Motor Imagery EEG Classification

Key Points

Abstract

Cite This Study