Deep neural networks (DNNs) are a leading computational framework for understanding neural visual processing. A standard approach for evaluating their similarity to brain function uses DNN activations to predict human neural responses to the same images, yet which visual properties drive this alignment remains unclear. Here, we show that texture-like representations, operationalized as global summaries of local image statistics, largely underlie this alignment. We recorded electroencephalography (EEG) from 57 participants viewing three image types: natural scenes, texture-synthesized versions that preserve global summaries of local statistics while disrupting global form, and isolated objects without backgrounds. Representational-similarity analysis showed the strongest DNN-EEG alignment when both systems processed texture-synthesized images. When using features from one image condition to predict EEG responses to another, we showed that features from texture-synthesized images generalized to natural scenes. Crucially, we observed a dissociation between DNN-EEG alignment and decodable object category information: alignment increased for texture-synthesized images even when object information was reduced. Together, our findings identify global summaries of local image statistics as a common currency linking DNNs and human visual processing, clarifying that global form features are not required for high DNN-EEG alignment. Our findings highlight the shared importance of local image statistics in artificial and biological visual systems.
Loke et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: