Automatic speech recognition (ASR) for Arabic poses persistent challenges due to morphological complexity, dialectal diversity, and limited annotated resources. While transformer-based models such as OpenAI’s Whisper have achieved strong baselines through transfer learning, their feed-forward sub-layers universally employ Multi-Layer Perceptrons (MLPs) with fixed activation functions, constraining both expressiveness and interpretability. This paper introduces KANWhisper, the first application of Kolmogorov-Arnold Networks (KANs) to automatic speech recognition. By replacing the MLP feed-forward layers in Whisper’s encoder and decoder with KAN layers featuring learnable B-spline activation functions, KANWhisper simultaneously enhances recognition accuracy and provides intrinsic model interpretability. Extensive experiments on the Common Voice Arabic dataset demonstrate that KANWhisper achieves a word error rate (WER) of 8.02% and character error rate (CER) of 2.78%, outperforming standard Whisper fine-tuning (8.61% WER), LoRA-adapted Whisper (8.10% WER), wav2vec2 XLSR-53 (11.50% WER), and SeamlessM4T v2-Large (13.20% WER), while using 16M fewer parameters (228M vs. 244M). Analysis of the learned activation functions reveals hierarchical specialization: lower encoder layers retain GELU-like activations for generic acoustic processing, while higher layers develop novel transformations that capture Arabic-specific phonological phenomena including emphatic consonant distinctions. Phoneme-level evaluation demonstrates a 33.3% relative reduction in error rates for Arabic confusable emphatic consonant pairs. Layer-wise representation probing confirms that KAN-enhanced representations encode emphatic distinctions with up to 8 percentage points higher accuracy than MLP baselines. These findings establish Kolmogorov-Arnold Networks as a viable and advantageous paradigm for speech recognition in morphologically complex languages, opening new avenues for interpretable, parameter-efficient, and accurate Arabic ASR.
Saeed et al. (Mon,) studied this question.