What question did this study set out to answer?

The aim is to develop a personalized learning platform that utilizes AI and optical modeling to improve assessment of oral skills.

June 4, 2026Open Access

Design of an AI-Driven Multimodal Personalized Learning Platform Based on Optical Learning Behavior Modeling

Key Points

The aim is to develop a personalized learning platform that utilizes AI and optical modeling to improve assessment of oral skills.
Re-engineering of a learning platform with AI and optical articulation modeling through webcams.
Integration of audio, visual, and behavioral data in a closed-loop feedback system.
Use of self-supervised and transformer-based methods for robust audio-visual representation.
Achieved AUC values for learner state predictions up to 0.835.
Online A/B testing showed increased learning retention and weekly study duration after optimization.

Abstract

The currently used individualized college English learning systems are largely based on the history of interaction and audio signals, which makes the ability to assess oral skills susceptible to environmental noise, microphone fluctuation, and low clarity of articulation even with recent innovations in deep learning-based models of the learner featuring adaptive content delivery. In order to drive out these limitations, the paper re-engineers a personalized learning platform supported by AI-powered system by extending this to include an optical articulation modeling stream through traditional webcams. The combination of (i) learning behavior traces, (ii) speech audio and (iii) optical cues such as lip/jaw movements, facial landmarks and gaze/attention indicators are integrated into a closed-loop pipeline of multimodal sensing -> representation learning ->learner state inference ->adaptive practice and feedback. The use of self-supervised and transformer-based fusion methods allows the system to represent audio-visual speech more robustly, even in noisy audio conditions, and light-weight optical pipelines (e.g. face mesh/landmark tracking) enable the real-time extraction of articulation features on consumer devices. Apart from enabling personalized vocabulary and reading practice, the improved platform offers formative feedback that centers on pronunciation, whereby phoneme-level acoustic confidence is matched with visual articulation consistency, and it measures engagement based on optical attention traits to promote intervention timing. Experiments on the learning data of the platform indicate high-quality predictions of the learner state (AUC values may be up to 0.835), and online A/B testing demonstrates the improvement of the rate of learning retention and the duration of the weekly study after the adaptive optimization. The proposed optics-enhanced design has a viable pathway to the enhanced oral-skill assessments and sensitive personalization of big college English learning.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper