Traditional reinforcement learning (RL) methods typically assume complete observation data, but real-world scenarios are often far more complex. Real-world deployments face missing data where part of the state dimensions become unavailable due to sensor failures, communication blackouts, or temporal sampling discontinuities. While recent research has achieved significant progress in handling missing data scenarios, current methodologies remain fundamentally constrained by their inability to effectively manage prolonged observation gaps and their low tolerance thresholds for high missing rates. In this work, we investigate the performance of reinforcement learning under the condition of missing data. We introduce a novel method, Mutual Information Aligned Generative Reinforcement Learning(MIA-GRL), which employs a spatiotemporal collaborative reconstruction to learn from historical reinforcement learning trajectories. This method synthesizes trajectories that encapsulate environmental characteristics and diversity through contextual information. We design an auxiliary loss function based on mutual information maximization, aiming to maximize the mutual information between the completed data and the original data, thereby ensuring that the completed data retains as much critical information from the original data as possible. Additionally, we utilize missing-aware contrastive learning to learn representations robust to missing patterns, enabling the policy network to capture intrinsic features relevant to the task. Experiments show our method outperforms state-of-the-art methods under the missing data conditions.
Ma et al. (Fri,) studied this question.