August 14, 2025Open Access

Latent Mamba-DQN: Improving Temporal Dependency Modeling in Deep Q-Learning via Selective State Summarization

Key Points

Mamba-DQN enhances efficiency in reinforcement learning by capturing temporal dependencies, leading to improved stability.
Experimental results show that Mamba-DQN outperforms conventional models like DQN, LSTM-DQN, and Transformer-DQN.
The proposed Mamba-SSM enables real-time learning in environments with high state transition rates and sparse rewards.
This framework suggests improvements in both Q-value estimation and prioritized experience replay for better sample efficiency.

Abstract

This study proposes a novel framework, Mamba-DQN, which integrates the state space-based time-series encoder Mamba-SSM into the Deep Q-Network (DQN) architecture to improve reinforcement learning performance in dynamic environments. Conventional reinforcement learning models primarily rely on instantaneous state information, limiting their ability to effectively capture temporal dependencies. To address this limitation, the proposed Mamba-DQN generates latent representations that summarize temporal information from state sequences and utilizes them for both Q-value estimation and Prioritized Experience Replay (PER), thereby enhancing the adaptability of policy learning and improving sample efficiency. The Mamba-SSM offers linear computational complexity and is optimized for parallel processing, enabling real-time learning and policy updates even in environments characterized by high state transition rates. The effectiveness of the proposed framework was validated through experiments conducted in environments with strong temporal dependencies and sparse rewards. Experimental results demonstrate that Mamba-DQN achieves superior stability and efficiency in policy learning compared to conventional DQN, LSTM-DQN, and Transformer-DQN models.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper