This study proposes a novel framework, Mamba-DQN, which integrates the state space-based time-series encoder Mamba-SSM into the Deep Q-Network (DQN) architecture to improve reinforcement learning performance in dynamic environments. Conventional reinforcement learning models primarily rely on instantaneous state information, limiting their ability to effectively capture temporal dependencies. To address this limitation, the proposed Mamba-DQN generates latent representations that summarize temporal information from state sequences and utilizes them for both Q-value estimation and Prioritized Experience Replay (PER), thereby enhancing the adaptability of policy learning and improving sample efficiency. The Mamba-SSM offers linear computational complexity and is optimized for parallel processing, enabling real-time learning and policy updates even in environments characterized by high state transition rates. The effectiveness of the proposed framework was validated through experiments conducted in environments with strong temporal dependencies and sparse rewards. Experimental results demonstrate that Mamba-DQN achieves superior stability and efficiency in policy learning compared to conventional DQN, LSTM-DQN, and Transformer-DQN models.
Ryu et al. (Thu,) studied this question.