What question did this study set out to answer?

To develop a framework that combines interpretability with deep reinforcement learning for autonomous driving decisions.

March 25, 2026Open Access

Hybrid Attribution-Based Interpretable Deep Reinforcement Learning for Autonomous Driving Behavior Decision-Making

Key Points

To develop a framework that combines interpretability with deep reinforcement learning for autonomous driving decisions.
Proposed Hybrid Attribution-based Interpretable Deep Reinforcement Learning framework (HA-IDRL).
Introduced Hybrid Gradient–LRP attribution for better interpretability.
Replaced multilayer perceptron with Kolmogorov–Arnold Networks for structural interpretability.
Tested framework on a highway lane-changing task using the highway-env simulator.
Achieved robust decision-making performance comparable to Dueling DQN and SAC.
Provided stable explanations aligned with human driving semantics.
Maintained low computational overhead for real-time interpretability.

Abstract

With the increasing deployment of autonomous driving systems, the opaque nature of deep reinforcement learning (DRL) decision models hinders understanding and validation of driving decisions. To address this challenge, we propose a Hybrid Attribution-based Interpretable Deep Reinforcement Learning framework (HA-IDRL) for autonomous driving behavior decision-making. The framework introduces a Hybrid Gradient–LRP (HGL) attribution mechanism that integrates gradient-based attribution and Layer-wise Relevance Propagation (LRP) to capture complementary sensitivity and contribution information, producing more consistent and comprehensive post hoc explanations. In addition to post hoc interpretability, we enhance structural interpretability by replacing the conventional multilayer perceptron (MLP) in the Dueling Deep Q-Network (Dueling DQN) architecture with Kolmogorov–Arnold Networks (KAN). By representing nonlinear interactions through learnable univariate functions and explicit summation structures, KAN provides inherently interpretable functional decompositions. The proposed framework is evaluated on a highway lane-changing task using the highway-env simulator. Experimental results show that HA-IDRL achieves decision-making performance comparable to representative DRL baselines, including Dueling DQN and Soft Actor-Critic (SAC), while providing explanations that are more stable and better aligned with human driving semantics. Moreover, the proposed method produces explanations with low computational overhead, enabling efficient and real-time interpretability in practical autonomous driving applications. Overall, HA-IDRL advances trustworthy autonomous driving by enabling high-performance decision-making and rigorous, multi-level interpretability, thereby improving the transparency and operational reliability of DRL-based driving policies.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Liu et al. (Mon,) studied this question.

synapsesocial.com/papers/69c37b74b34aaaeb1a67dd1f https://doi.org/https://doi.org/10.3390/app16063096

Bookmark

View Full Paper