What question did this study set out to answer?

This work aims to present XAI2Brain, a framework emphasizing mechanistic interpretability in AI and neuroscience integration.

June 20, 2026Open Access

XAI2Brain: A Perspective on Mechanistic Interpretability for Brain–AI Alignment

Puntos clave

This work aims to present XAI2Brain, a framework emphasizing mechanistic interpretability in AI and neuroscience integration.
Introduced a multi-level conceptual framework for brain–AI alignment.
Surveyed different XAI methodologies, including feature attribution and human-centric interpretability approaches.
Outlined challenges in scaling explainability for large models and limitations of current mechanistic interpretability methods.
Proposes a research roadmap towards interpretable, adaptive AI systems inspired by neuroscience principles.
Highlights the importance of interactive human-AI collaboration for understanding and guiding AI systems.
Identifies open challenges such as explanation instability and lack of causal guarantees in interpretability methods.

Resumen

The convergence of artificial intelligence (AI), explainable AI (XAI), and neuroscience is fostering new opportunities for understanding both machine and biological intelligence through interpretable and human-centered learning paradigms. In this Perspective, we introduce XAI2Brain as a conceptual framework for brain–AI alignment, positioning mechanistic interpretability as an intermediate layer connecting neural network representations, human understanding, and neuroscience-inspired AI design. Rather than viewing XAI solely as a post hoc transparency tool, we emphasize its emerging role in enabling mechanistic analysis of internal model representations, concept-level reasoning, and interactive human–AI alignment. We define XAI2Brain as a multi-level conceptual framework rather than a deployable system, explicitly aimed at structuring brain–AI alignment across representation-level, mechanism-level, and interaction-level perspectives. We survey the evolution of XAI methodologies—from feature attribution and concept-based explanations to mechanistic and human-centric interpretability approaches—and discuss how these methods may support bidirectional knowledge transfer between AI systems and cognitive neuroscience. Importantly, we adopt a cautious stance on brain–AI analogy, explicitly recognizing that artificial neural representations are not equivalent to biological neural representations, and instead focusing on functional and informational correspondences rather than structural equivalence. Unlike conventional human-in-the-loop or reinforcement learning from human feedback paradigms that primarily optimize behavioral outputs, XAI2Brain focuses on cognitively interpretable and mechanistically grounded alignment between AI systems and human reasoning processes. This alignment promotes interactive human-in-the-loop intelligence, empowering humans to comprehend, guide, and refine AI systems, while enabling AI systems to better interpret human instructions, intentions, and contextual reasoning. We further discuss the challenges of scaling explainability to large generative and multimodal models, including issues of interpretability robustness, cognitive compatibility, evaluation, and ethical accountability. We also highlight key limitations of current mechanistic interpretability methods, including explanation instability, representation superposition, and lack of causal guarantees, underscoring that these challenges remain open research problems. Rather than proposing a complete artificial brain architecture, this Perspective outlines a research roadmap toward more interpretable, adaptive, and neuroscience-inspired AI systems capable of supporting future brain–AI integration and collaborative intelligence. We additionally clarify that this work follows a narrative perspective review methodology with structured thematic synthesis of the literature. By framing explainability as a bridge between mechanistic AI understanding, cognitive science, and human-centered interaction, XAI2Brain highlights the importance of interpretable alignment for the next generation of brain-inspired AI systems.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Jiang et al. (Thu,) studied this question.

synapsesocial.com/papers/6a36300ddb0793dc1a5375ba https://doi.org/https://doi.org/10.3390/make8060167

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo