What question did this study set out to answer?

The aim is to enhance the explainability of transformer models by optimizing feature corruption sequences using a novel framework called EvoDropX.

April 17, 2026Open Access

View Full Paper

EvoDropX: Evolutionary optimization of feature corruption sequences for faithful explanations of transformer models

DSDhiraj Kumar SinghIndian Institute of Technology Indore CRConor RyanUniversity of Limerick

Key Points

The aim is to enhance the explainability of transformer models by optimizing feature corruption sequences using a novel framework called EvoDropX.
Developed EvoDropX as an optimization problem to improve explanation fidelity.
Utilized Grammatical Evolution to evolve feature corruption sequences maximizing SRG.
Conducted experiments across multiple datasets, including IMDb and Stanford Sentiment Treebank, using different transformer models.
Compared EvoDropX performance against state-of-the-art xAI methods like SHAP and LIME.
EvoDropX significantly outperforms existing xAI methods under the SRG criterion.
Achieved a 74.77% improvement in SRG on the IMDB dataset with the BERT model.
Consistent performance improvements observed across all dataset-model pairs.
Qualitative analyses indicate EvoDropX identifies sentiment-bearing terms and their relationships effectively.

Abstract

As deep learning models become increasingly integrated into critical decision-making systems, the need for explainable Artificial Intelligence (xAI) has grown paramount to ensure transparency, accountability, and trust. Post hoc explainability methods, which analyse trained models to interpret their predictions without modifying the underlying architecture, have become increasingly important, especially in fields such as healthcare and finance. Modern xAI techniques often produce feature importance rankings that fail to capture the true causal influence of features, particularly in transformer-based models. Recent quantitative metrics, such as Symmetric Relevance Gain (SRG), which measures the area between the feature corruption performance curves of the Most Important Feature (MIF) and the Least Important Feature (LIF), provide a more rigorous basis for evaluating explanation fidelity. In this study, we first show that existing xAI methods exhibit consistently poor performance under the SRG criterion when explaining transformer-based text classifiers. To address these limitations, we introduce EvoDropX, a novel framework that formulates explanation as an optimisation problem. EvoDropX leverages Grammatical Evolution (GE) to evolve sequences of feature corruption with the explicit objective of maximising SRG, thereby identifying features that most strongly influence model predictions. EvoDropX provides interventional, input–output (behavioural) explanations and does not attempt to infer or interpret internal model mechanisms. Through comprehensive experiments across multiple datasets (IMDb movie reviews (IMDB), Stanford Sentiment Treebank (SST-2), Amazon Polarity (AP)), multiple transformer models (Bidirectional Encoder Representations from Transformers (BERT), RoBERTa, DistilBERT), and multiple metrics (SRG, MIF, LIF, Counterfactual Conciseness (CFC)), we demonstrate that EvoDropX significantly outperforms all state-of-the-art (SOTA) xAI baselines including Attention-Aware Layer- Wise Relevance Propagation for Transformers (AttnLRP), SHapley Additive exPlanations (SHAP), and Local Interpretable Model-agnostic Explanations (LIME), when evaluated using intervention-based faithfulness criteria. Notably, EvoDropX achieves 74.77% improvement in SRG than the best-performing baseline on the IMDB dataset with the BERT model, with consistent improvements observed across all dataset-model pairs. Finally, qualitative and linguistic analyses reveal that EvoDropX captures both sentiment-bearing terms and their structural relationships within sentences, yielding explanations that are both faithful and interpretable.

Demander à l'IA

Bookmark

View Full Paper

Demander à l'IA

Bookmark

View Full Paper

EvoDropX: Evolutionary optimization of feature corruption sequences for faithful explanations of transformer models

Key Points

Abstract

Cite This Study