ABSTRACT Working memory (WM) relies on both shared and modality‐specific neural resources, but the interplay between domain‐general and modality‐specific mechanisms remains debated. To address this issue, we employed a sequential delayed matching paradigm incorporating three types of stimuli: spatial dot locations (visual), auditory pure tones (non‐verbal auditory), and auditory consonant letters (verbal auditory). Behaviorally, auditory consonant letters yielded significantly higher accuracy compared to the other two stimulus types. Univariate GLM analyses revealed delay‐period activations across all conditions in a core frontoparietal network, including the dorsolateral prefrontal cortex (dlPFC), posterior parietal cortex (PPC), insula, and cerebellum. Notably, the PPC showed enhanced activation in the visuospatial condition, whereas the insula and left superior temporal regions exhibited stronger engagement during auditory tasks‐highlighting modality‐specific contributions. ROI‐ and searchlight‐based multivoxel pattern analysis (MVPA) showed robust pairwise decoding but no significant three‐way classification, indicating linearly separable but partially overlapping representations within frontoparietal regions. Functional connectivity analyses revealed stimulus‐ and load‐dependent modulations: visuospatial WM enhanced connectivity within parietal–occipital networks and between parietal and frontal control regions; auditory consonant letters strengthened connectivity with language‐related temporal areas, supplementary motor area, cerebellum, and left inferior frontal gyrus; auditory pure tones showed load‐dependent connectivity with right frontal, supramarginal, and bilateral insula networks. Together, these findings demonstrate that visual and auditory WM engage a common frontoparietal control network while simultaneously recruiting distinct modality‐specific regions and connectivity patterns. The results highlight that auditory‐verbal WM uniquely engages articulatory rehearsal circuits, supporting nested domain‐specific mechanisms within a shared attentional architecture. Overall, this study provides converging evidence for the integration of modality‐specific and domain‐general components of WM, reconciling multi‐component and embedded‐process models.
Wang et al. (Fri,) studied this question.