What question did this study set out to answer?

The aim is to improve reinforcement learning performance when dealing with missing data scenarios.

May 13, 2026Open Access

Mia-grl: mutual information aligned generative reinforcement learning with missing data

Key Points

The aim is to improve reinforcement learning performance when dealing with missing data scenarios.
Introduced mutual information aligned generative reinforcement learning (MIA-GRL)
Utilized spatiotemporal collaborative reconstruction for trajectory synthesis
Designed an auxiliary loss function based on mutual information maximization
Implemented missing-aware contrastive learning for robust representation learning
Demonstrated improved policy network performance in the presence of missing data
Outperformed state-of-the-art methods under high missing data rates
Maintained critical information retention from original data during reconstruction.

Abstract

Traditional reinforcement learning (RL) methods typically assume complete observation data, but real-world scenarios are often far more complex. Real-world deployments face missing data where part of the state dimensions become unavailable due to sensor failures, communication blackouts, or temporal sampling discontinuities. While recent research has achieved significant progress in handling missing data scenarios, current methodologies remain fundamentally constrained by their inability to effectively manage prolonged observation gaps and their low tolerance thresholds for high missing rates. In this work, we investigate the performance of reinforcement learning under the condition of missing data. We introduce a novel method, Mutual Information Aligned Generative Reinforcement Learning(MIA-GRL), which employs a spatiotemporal collaborative reconstruction to learn from historical reinforcement learning trajectories. This method synthesizes trajectories that encapsulate environmental characteristics and diversity through contextual information. We design an auxiliary loss function based on mutual information maximization, aiming to maximize the mutual information between the completed data and the original data, thereby ensuring that the completed data retains as much critical information from the original data as possible. Additionally, we utilize missing-aware contrastive learning to learn representations robust to missing patterns, enabling the policy network to capture intrinsic features relevant to the task. Experiments show our method outperforms state-of-the-art methods under the missing data conditions.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper