Real-world control tasks frequently operate under conditions of partial observability, where complete state information is unavailable due to sensor limitations, noise, and inherent system complexities. Such scenarios are often modeled as Partially Observable Markov Decision Processes (POMDPs). While Deep Reinforcement Learning (DRL) employing Recurrent Neural Networks, particularly Long Short-Term Memory (LSTM), is a prevalent approach to address these POMDPs, it often incurs substantial computational costs and can suffer from training instabilities, posing significant challenges for deployment in resource-constrained environments such as edge devices. This study proposes ESN-TD3, a novel DRL framework that integrates Echo State Networks (ESNs) with the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. We demonstrate that ESN-TD3 significantly accelerates learning in partially observable control tasks, reducing training time by about a factor of five compared to conventional LSTM-based methods, while achieving comparable performance in POMDP swing-up tasks. The proposed method broadens DRL's applicability in real-world systems where computational resources are limited and learning acceleration is critical.
Matsuki et al. (Sat,) studied this question.