Explainable Generative AI: A Two-Stage Review of Existing Techniques and Future Research Directions
Abstract
Generative Artificial Intelligence (GenAI) models produce increasingly sophisticated outputs, yet their underlying mechanisms remain opaque. To clarify how explainability is conceptualized and implemented in GenAI research, this two-stage review systematically examined 261 articles retrieved from six major databases. After removing duplicates and applying predefined inclusion criteria, 63 articles were retained for full analysis. In the first stage, an umbrella review synthesized insights from 18 review papers to identify prevailing frameworks, strategies, and conceptual challenges surrounding explainability in GenAI. In the second stage, an empirical review analyzed 45 primary studies to assess how explainability is operationalized, evaluated, and applied in practice. Across both stages, findings reveal fragmented approaches, a lack of standardized evaluation frameworks, and persistent challenges, including limited generalizability, interpretability–performance trade-offs, and high computational costs. The review concludes by outlining future research directions aimed at developing user-centric, regulation-aware explainability methods tailored to the unique architectures and application contexts of GenAI. By consolidating theoretical and empirical evidence, this study establishes a comprehensive foundation for advancing transparent, interpretable, and trustworthy GenAI systems.
Key Points
Objective
The aim is to clarify how explainability is conceptualized and implemented in generative AI research.
Methods
- Conducted a two-stage review of 261 articles from six databases.
- Performed an umbrella review of 18 existing review papers.
- Executed an empirical review analyzing 45 primary studies focusing on explainability.
Results
- Identified fragmented approaches to explainability in generative AI.
- Found a lack of standardized evaluation frameworks for explainability.
- Highlighted persistent challenges like limited generalizability and interpretability–performance trade-offs.