What question did this study set out to answer?

This research aims to explore linguistic biases in the Whisper large-v3 ASR model when processing non-native Arabic speech compared to human perception.

March 23, 2026Open Access

Measuring linguistic bias in ASR: Whisper large-v3 on non-native speech versus human perception

Key Points

This research aims to explore linguistic biases in the Whisper large-v3 ASR model when processing non-native Arabic speech compared to human perception.
Compared word error rate (WER) of ASR and human listeners
Used linear mixed effects model analysis for data interpretation
Conducted phoneme error rate (PER) analysis to identify bias sources
ASR system had a WER of 66%, similar to human listeners' average WER of 67%
Higher intelligibility ratings correlated with lower WER
Higher accentedness ratings linked with higher WER
Comprehensibility showed no predictive power for WER despite a marginal positive correlation

Abstract

While automatic speech recognition (ASR) models are advancing rapidly, they still involve various systematic biases. Understanding these biases can produce fairer and inclusive ASR pipelines. The main objective of this exploratory paper is to investigate linguistic-related bias in an ASR system, Whisper large-v3, when processing non-native Arabic speech, as compared to human perception of three constructs: intelligibility, comprehensibility, and foreign-accentedness. We compared word error rate (WER) across ten human listeners and the ASR system using linear mixed effects model analysis, and conducted phoneme error rate (PER) analysis to identify potential sources of linguistic bias. The analysis revealed that the ASR system (WER=66%) performed almost as human raters (average WER=67%). There was a significant relationship between WER and intelligibility, indicating that higher intelligibility ratings were associated with lower WER. In addition, higher accentedness ratings are associated with higher WER while comprehensibility did not predict WER despite the existence of a marginal positive association. These findings are further supported by the system’s bias toward unmarked phonemes, such as emphatic and guttural sounds, highlighting persistent recognition challenges with acoustically complex segments. These findings matter for explainable and fair ASR systems and contribute to the ASR interpretability and explainability research.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Issa et al. (Thu,) studied this question.

synapsesocial.com/papers/69c0df0bfddb9876e79c150b https://doi.org/https://doi.org/10.1016/j.procs.2026.01.080

Bookmark

View Full Paper