What question did this study set out to answer?

May 6, 2026Open Access

Parallel Bilingual Datasets: A Multimodal Deep Learning Framework for Proficiency and Style Classification

Key Points

The aim is to develop a framework for automatic proficiency and style classification of bilingual Tamil–Hindi data.
Utilized a dual-headed neural architecture for simultaneous proficiency and style prediction.
Evaluated five deep learning models using text-only, audio-only, and learnable fusion settings.
Employed a curated dataset of bilingual text and synthetic speech for multimodal experimentation.
Text-based models show strong performance in both proficiency and style classification tasks.
Audio-only model displays limited effectiveness in capturing relevant linguistic information.
Fusion models result in only marginal improvements over text-based models.

Abstract

This study presents a multimodal deep learning framework for automatic proficiency and style classification of parallel Bilingual Tamil–Hindi learner data. The proposed system employs a dual-headed neural architecture to simultaneously predict proficiency levels (Basic, Advanced) and stylistic categories (Formal, Literary) using shared feature representations. A curated dataset of bilingual text samples is utilized, along with synthetic speech generated through text-to-speech (TTS) to enable controlled multimodal experimentation. Five deep learning architectures are evaluated under text-only, audio-only, and learnable fusion settings. Experimental findings indicate that text-based models consistently achieve strong performance in both proficiency and style classification tasks. In contrast, the audio-only model demonstrates limited effectiveness, highlighting the constraints of synthetic acoustic features in capturing meaningful linguistic information. The fusion models provide only marginal improvements over text-based approaches, suggesting that textual representations play a dominant role in proficiency and stylistic classification within controlled datasets. These results emphasize the importance of linguistic features over acoustic signals for automated language assessment in low-resource settings. The proposed framework provides a scalable and reproducible approach and offers a foundation for future work incorporating real speech data and more diverse linguistic inputs.

Parallel Bilingual Datasets: A Multimodal Deep Learning Framework for Proficiency and Style Classification

Key Points

Abstract

Cite This Study