What question did this study set out to answer?

This research aims to improve multimodal remote sensing image classification by addressing uncertainties in data acquisition and labeling.

June 4, 2026

RL-FM: Reinforcement Learning-Driven Flow Matching for Multimodal Remote Sensing Image Classification

Key Points

This research aims to improve multimodal remote sensing image classification by addressing uncertainties in data acquisition and labeling.
Developed a flow matching optimization framework using reinforcement learning (RL-FM).
Utilized variational autoencoders to model feature distributions of remote sensing images from different modalities.
Formulated distribution evolution as a Markov decision process to learn paths from initial to target distributions.
Achieved state-of-the-art performance on multiple benchmark datasets for multimodal remote sensing classification.
Demonstrated better generalization performance by effectively addressing uncertainties in training data.
Improved robustness against spurious correlations and noise in classification tasks.

Abstract

Multimodal remote sensing image classification improves models' capacity to recognize complex land-cover patterns by integrating data from heterogeneous sensors such as hyperspectral image (HSI) and light detection and ranging (LiDAR). However, many existing classification models ignore aleatoric and epistemic uncertainties introduced during data acquisition and labeling. As a result, they become less robust to noise and more vulnerable to spurious correlations, which ultimately weakens their ability to generalize to unseen data. To mitigate these issues, multimodal joint distribution modeling is reformulated as a flow matching optimization problem that learns a distribution evolution process under an unknown target distribution. A reinforcement learning-driven flow matching (RL-FM) framework is proposed for multimodal remote sensing image classification. Specifically, feature distributions of remote sensing images from different modalities are first modeled using variational autoencoders, and a multimodal mixture distribution is then constructed via Gaussian mixture strategy to serve as an initial distribution for flow matching. To perform flow matching optimization when target distribution is unknown, label information is further exploited to guide the transformation of initial distribution toward target distribution. At the same time, the distribution evolution process of flow matching is formulated as a Markov decision process (MDP), enabling the model to learn an evolution path from initial distribution to target distribution by maximizing the expected cumulative reward. RL-FM jointly accounts for immediate classification loss and long-term generalization performance, thereby alleviating suboptimal convergence caused by myopic gradient updates. Furthermore, by incorporating counterfactual causal inference into policy optimization, a counterfactual proximal policy optimization (CPPO) is designed. CPPO can strengthen the model capacity to capture the causal relationship between action and reward, thus improving generalization in scenarios with limited labeled samples. Experimental results on multiple benchmark datasets demonstrate that the proposed RL-FM achieves state-of-the-art performance on multimodal remote sensing image classification tasks. The code is available at: https://github.com/zwdmw/RL-FM.

Bookmark

Cite This Study

Zhang et al. (Thu,) studied this question.

synapsesocial.com/papers/6a21151ad499ed480b16e62c https://doi.org/https://doi.org/10.1109/tnnls.2026.3696365

Bookmark