What question did this study set out to answer?

This research investigates how individual differences in second-language proficiency affect neural processing of continuous speech.

April 16, 2026Open Access

Applying Self-Information-Inspired Encoding to Task-Based fMRI for Decoding Second-Language Proficiency During Naturalistic Speech Listening

Key Points

This research investigates how individual differences in second-language proficiency affect neural processing of continuous speech.
43 healthy participants completed listening tasks during fMRI scanning.
Participants were grouped by proficiency levels (low, moderate, high) based on behavioral scores.
Three temporal information-encoding frameworks were evaluated using BOLD dynamics.
Categorical classification and continuous regression methods were applied to analyze the data.
Exploratory ROI analysis highlighted critical brain regions using nonparametric statistics.
Categorical classification showed trends but lacked statistical significance.
Multivariate regression effectively predicted continuous proficiency scores with statistical significance.
Bilateral orbital inferior frontal gyrus was identified as a key region related to L2 proficiency.
Self-information weighting improved noise filtering in cognitive variance measurement.

Abstract

Individual differences in second-language (L2) proficiency are expected to influence how listeners parse and represent continuous speech, yet their neural signatures under naturalistic conditions remain unclear. We investigated this question using task-based fMRI during continuous speech listening. A total of 43 healthy participants completed four listening runs synchronized with MRI acquisition via PsychoPy (Peirce 2007), with eyes open throughout scanning. To promote sustained attention and comprehension, participants provided a native-language oral recall after each run. Based on behavioral proficiency scores, participants were grouped into low- (LP, n = 14), moderate- (MP, n = 14), and high-proficiency (HP, n = 15) groups. We evaluated three temporal information-encoding frameworks derived from BOLD dynamics: direct temporal series, functional connectivity (FC), and self-information weighted inter-subject correlation (ISC-W). Using a 10 × 5-fold nested cross-validation scheme, we tested both categorical classification (Support Vector Machines) for discrete proficiency groups (LP, MP, HP) and continuous multivariate regression (Ridge/Lasso) for continuous proficiency scores. Furthermore, we applied ROI-based ANOVA and univariate Neural Correlation Analysis (NCA) to identify key brain regions, evaluating significance via nonparametric permutation testing (1000 permutations) and False Discovery Rate (FDR) correction. Results indicated that while categorical classification yielded numerical trends—with ISC-W performing best—it did not reach statistical significance under stringent permutation testing. However, multivariate continuous regression using ISC-W features successfully predicted continuous proficiency scores with statistical significance (p < 0. 05). Exploratory ROI analysis highlighted the bilateral orbital inferior frontal gyrus (IFGₒrbbilat) as a highly sensitive region. These findings suggest that L2 proficiency is best represented as a distributed, continuous neural variable, and that self-information weighting effectively filters background noise to capture cognitive variance. Methodologically, this study provides a reproducible pipeline integrating information-theoretic feature construction with rigorous whole-brain nonparametric inference.

Applying Self-Information-Inspired Encoding to Task-Based fMRI for Decoding Second-Language Proficiency During Naturalistic Speech Listening

Key Points

Abstract

Cite This Study