As deep learning models become increasingly integrated into critical decision-making systems, the need for explainable Artificial Intelligence (xAI) has grown paramount to ensure transparency, accountability, and trust. Post hoc explainability methods, which analyse trained models to interpret their predictions without modifying the underlying architecture, have become increasingly important, especially in fields such as healthcare and finance. Modern xAI techniques often produce feature importance rankings that fail to capture the true causal influence of features, particularly in transformer-based models. Recent quantitative metrics, such as Symmetric Relevance Gain (SRG), which measures the area between the feature corruption performance curves of the Most Important Feature (MIF) and the Least Important Feature (LIF), provide a more rigorous basis for evaluating explanation fidelity. In this study, we first show that existing xAI methods exhibit consistently poor performance under the SRG criterion when explaining transformer-based text classifiers. To address these limitations, we introduce EvoDropX, a novel framework that formulates explanation as an optimisation problem. EvoDropX leverages Grammatical Evolution (GE) to evolve sequences of feature corruption with the explicit objective of maximising SRG, thereby identifying features that most strongly influence model predictions. EvoDropX provides interventional, input–output (behavioural) explanations and does not attempt to infer or interpret internal model mechanisms. Through comprehensive experiments across multiple datasets (IMDb movie reviews (IMDB), Stanford Sentiment Treebank (SST-2), Amazon Polarity (AP)), multiple transformer models (Bidirectional Encoder Representations from Transformers (BERT), RoBERTa, DistilBERT), and multiple metrics (SRG, MIF, LIF, Counterfactual Conciseness (CFC)), we demonstrate that EvoDropX significantly outperforms all state-of-the-art (SOTA) xAI baselines including Attention-Aware Layer- Wise Relevance Propagation for Transformers (AttnLRP), SHapley Additive exPlanations (SHAP), and Local Interpretable Model-agnostic Explanations (LIME), when evaluated using intervention-based faithfulness criteria. Notably, EvoDropX achieves 74.77% improvement in SRG than the best-performing baseline on the IMDB dataset with the BERT model, with consistent improvements observed across all dataset-model pairs. Finally, qualitative and linguistic analyses reveal that EvoDropX captures both sentiment-bearing terms and their structural relationships within sentences, yielding explanations that are both faithful and interpretable.
Building similarity graph...
Analyzing shared references across papers
Loading...
Dhiraj Kumar Singh
Conor Ryan
Building similarity graph...
Analyzing shared references across papers
Loading...
Singh et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69e1cecc5cdc762e9d857c26 — DOI: https://doi.org/10.34961/19358