What question did this study set out to answer?

This research aims to create an inclusive platform integrating AI services for language learning that supports both deaf and hearing users.

June 15, 2026Open Access

AI-based services for inclusive language learning in immersive XR environments: Speech translation, and sign language integration

Key Points

This research aims to create an inclusive platform integrating AI services for language learning that supports both deaf and hearing users.
Developed a modular platform using six AI services, including speech-to-text and sign language translation.
Evaluated system performance through technical benchmarking, load testing, and user experience assessment.
Conducted a companion pilot study with 10 participants to gauge user satisfaction and perceived effectiveness.
Platform demonstrated real-time viability in XR settings, with an average end-to-end pipeline latency of 2.05 ± 0.31 s for hearing users and 2.32 ± 0.34 s for deaf users.
AWS Polly achieved the lowest latency (50–100 ms first byte) for text-to-speech tasks.
Companion pilot study reported 92% user satisfaction and a mean experience rating of 4.6/5.0, with unanimous demand for more language support.

Abstract

Background Extended Reality (XR) technologies offer transformative potential for language education, yet current platforms largely neglect the accessibility needs of deaf and hard-of-hearing individuals. Existing solutions typically operate in single-language environments and lack integrated support for sign language and multimodal communication. There is a critical need for inclusive platforms that serve both deaf and hearing learners through cross-modal AI services embedded in immersive environments. Methods This study presents a modular platform integrating six AI services: speech-to-text transcription (OpenAI Whisper), multilingual translation (Meta NLLB), text-to-speech synthesis (AWS Polly), sentiment analysis (RoBERTa), session summarisation (flan-t5-base-samsum), and International Sign (IS) translation via Google MediaPipe. An IS dataset of 750 gesture videos was processed to extract hand landmark coordinates mapped to 3D avatar animations within a Unity-based VR environment on Meta Quest 3 headsets. The system was validated through technical benchmarking of AI service performance, including comparative evaluation of text-to-speech services and multilingual translation models (NLLB-200 and EuroLLM 1.7B variants), load testing to assess platform. scalability, and end-to-end pipeline latency measurement for both the hearing and the deaf user pathways. The educational scenario was additionally evaluated in a companion pilot study, 50 which shares the same underlying AI services and provides complementary user-perception evidence. Results Technical benchmarking confirmed the platform’s viability for real-time XR deployment. TTS benchmarking confirmed AWS Polly’s lowest latency (50–100 ms first byte) at competitive cost. The EuroLLM 1.7B Instruct model achieved a BLEU score of 84.34, outperforming NLLB’s 79.25. Load testing with 1,000 simulated concurrent users demonstrated average response times under 800 milliseconds with no critical failures. Avatar animation latency for IS sign rendering remained consistently under 300 milliseconds. End-to-end pipeline latency averaged 2.05 ± 0.31 s for the hearing pathway and 2.32 ± 0.34 s for the deaf (IS) pathway, both within accepted thresholds for conversational educational use. The companion pilot (N = 10) reported a mean overall experience rating of 4.6/5.0, 92% user satisfaction and unanimous (100%) demand for expanded language and sign-language support. 50 Conclusions The results presented in this study focus on the technical feasibility of integrating cross-modal AI services within XR environments for accessible, multilingual language learning. The modular architecture enables independent scaling and adaptation to diverse contexts, laying the groundwork for equitable educational solutions aligned with EU digital accessibility objectives.

AI-based services for inclusive language learning in immersive XR environments: Speech translation, and sign language integration

Key Points

Abstract

Cite This Study