What question did this study set out to answer?

The aim is to improve 3D representation learning by addressing limitations in conventional masking strategies that neglect geometric differences among points.

May 25, 2026

Point-RMAE: Reinforcement Masked Autoencoder for 3D Representation Learning

Key Points

The aim is to improve 3D representation learning by addressing limitations in conventional masking strategies that neglect geometric differences among points.
Introduced a reinforcement learning approach to optimize masking strategies during pretraining.
Developed the Masking Strategy Analyzer and Dynamic Masking Generator to adaptively select masking protocols.
Incorporated a Flow Matching Point Cloud Fast Generator to enrich the reward function with distribution-aware signals.
Achieved superior performance in multiple downstream tasks including shape classification and action recognition across ten datasets.
Demonstrated significant improvements in reconstruction quality and representation capture compared to existing methods.

Abstract

The Mainstream 3D masked point modeling representation learning community typically employs predefined, fixed-ratio random or block masking strategies, aiming to obtain optimal representations and achieve high downstream performance. However, these empirical designs overlook the significant geometric information and structural importance differences that are inherent among different 3D points, leading to a suboptimal trade-off between the representation capture capabilities and reconstruction difficulty of such masking strategies. To address this issue, we are the first to present this decision-making problem to a reinforcement learning agent and propose a Reinforcement Masked Autoencoder for 3D representation learning, named Point-RMAE. Guided by geometric features as state factor, this method leverages the Masking Strategy Analyzer and the Dynamic Masking Generator to adaptively decide and apply the masking strategy during pretraining. The Masking Ratio Scheduling module dynamically adjusts the masking ratio based on the optimal strategy. Subsequently, the analyzer is updated by multiscale rewards derived from reconstruction quality level, distribution-aware feedback, and policy exploration. Notably, to enrich the Reward Function with distribution-aware signals and avoid decision collapse issue, we propose a Flow Matching Point Cloud Fast Generator that guides the selected masking decisions. Our method achieves outstanding performance across downstream tasks such as shape classification, medical diagnosis, object detection, action recognition, denoising and multiscale scene segmentation on ten popular 3D and 4D datasets. More importantly, Point-RMAE pioneers the application of reinforcement learning in 3D self-supervised representation learning.

Bookmark

Point-RMAE: Reinforcement Masked Autoencoder for 3D Representation Learning

Key Points

Abstract

Cite This Study