Predicting attention-deficit/hyperactivity disorder (ADHD) from neuroimaging remains challenging due to heterogeneous brain morphology. In this study, we proposed an end-to-end framework using Vision Transformer (ViT) models to directly learn discriminative features from individual-space T1-weighted MRI. We evaluated two anatomical coverage patterns to assess the impact of data reduction and spatial granularity: (1) whole-brain (WB) axial slices and (2) 11 representative slices (R11). Our results demonstrated that the ViT achieved the highest numerical AUC, significantly outperforming the baseline CNN and the conventional ROI-based approach, while performing comparably to ResNet. Notably, the transition from WB to R11 (AUC 0.75) showed no statistically significant degradation in performance ( p = 0 . 19 ), proving that high diagnostic integrity can be maintained even with substantial anatomical data reduction. Interpretability analysis via SHAP, applied to the R11 configuration, identified consistent high-impact spatial clusters across anatomical axes. Specifically, the precentral gyrus and occipital regions emerged as robust neuroanatomical substrates for ADHD classification. These findings suggest that transformer-based self-attention effectively integrates distributed morphological variations across sensorimotor and visual processing networks, providing an anatomically coherent approach to ADHD diagnosis. • We propose an end-to-end framework for ADHD prediction using individual- space T1-weighted MRI. • Vision Transformer (ViT) models directly learn discriminative features from whole-brain and representative slices. • The ViT model significantly outperforms baseline CNNs and conven- tional ROI-based approaches. • SHAP analysis highlights potential structural markers within the motor and sensory networks.
Maeda et al. (Wed,) studied this question.