What question did this study set out to answer?

The research aims to enhance ADHD prediction from neuroimaging by using a Vision Transformer framework.

April 22, 2026Open Access

ADHD prediction from individual-space T1 images using a Vision Transformer with a gross-region grid framework

Key Points

The research aims to enhance ADHD prediction from neuroimaging by using a Vision Transformer framework.
Utilized T1-weighted MRI images focusing on whole-brain and representative slices.
Compared performance of the Vision Transformer against baseline CNN and ROI-based methods.
Performed interpretability analysis via SHAP to identify structural markers.
The Vision Transformer achieved the highest AUC, significantly outperforming baseline CNN models.
Transitioning from whole-brain to representative slices showed no significant performance drop.
SHAP analysis revealed key neuroanatomical regions associated with ADHD classification.

Abstract

Predicting attention-deficit/hyperactivity disorder (ADHD) from neuroimaging remains challenging due to heterogeneous brain morphology. In this study, we proposed an end-to-end framework using Vision Transformer (ViT) models to directly learn discriminative features from individual-space T1-weighted MRI. We evaluated two anatomical coverage patterns to assess the impact of data reduction and spatial granularity: (1) whole-brain (WB) axial slices and (2) 11 representative slices (R11). Our results demonstrated that the ViT achieved the highest numerical AUC, significantly outperforming the baseline CNN and the conventional ROI-based approach, while performing comparably to ResNet. Notably, the transition from WB to R11 (AUC 0.75) showed no statistically significant degradation in performance ( p = 0 . 19 ), proving that high diagnostic integrity can be maintained even with substantial anatomical data reduction. Interpretability analysis via SHAP, applied to the R11 configuration, identified consistent high-impact spatial clusters across anatomical axes. Specifically, the precentral gyrus and occipital regions emerged as robust neuroanatomical substrates for ADHD classification. These findings suggest that transformer-based self-attention effectively integrates distributed morphological variations across sensorimotor and visual processing networks, providing an anatomically coherent approach to ADHD diagnosis. • We propose an end-to-end framework for ADHD prediction using individual- space T1-weighted MRI. • Vision Transformer (ViT) models directly learn discriminative features from whole-brain and representative slices. • The ViT model significantly outperforms baseline CNNs and conven- tional ROI-based approaches. • SHAP analysis highlights potential structural markers within the motor and sensory networks.

ADHD prediction from individual-space T1 images using a Vision Transformer with a gross-region grid framework

Key Points

Abstract

Cite This Study