What type of study is this?

This is a Quantitative Study study.

October 9, 2025Open Access

Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

Key Points

PLARE achieves competitive performance compared to existing VLM-based methods in robotic manipulation.
The approach eliminates the need for labor-intensive reward function design through preference label queries.
Experiments demonstrate effective application of PLARE in real-world robotic manipulation tasks.
This method improves safety by reducing reliance on costly online data collection.

Abstract

Offline reinforcement learning (RL) provides a powerful framework for training robotic agents using pre-collected, suboptimal datasets, eliminating the need for costly, time-consuming, and potentially hazardous online interactions. This is particularly useful in safety-critical real-world applications, where online data collection is expensive and impractical. However, existing offline RL algorithms typically require reward labeled data, which introduces an additional bottleneck: reward function design is itself costly, labor-intensive, and requires significant domain expertise. In this paper, we introduce PLARE, a novel approach that leverages large vision-language models (VLMs) to provide guidance signals for agent training. Instead of relying on manually designed reward functions, PLARE queries a VLM for preference labels on pairs of visual trajectory segments based on a language task description. The policy is then trained directly from these preference labels using a supervised contrastive preference learning objective, bypassing the need to learn explicit reward models. Through extensive experiments on robotic manipulation tasks from the MetaWorld, PLARE achieves performance on par with or surpassing existing state-of-the-art VLM-based reward generation methods. Furthermore, we demonstrate the effectiveness of PLARE in real-world manipulation tasks with a physical robot, further validating its practical applicability.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper

Cite This Study

Luu et al. (Thu,) studied this question.

synapsesocial.com/papers/68e70db790569dd607ee6833 https://doi.org/https://doi.org/10.48550/arxiv.2507.23391

Demander à l'IA

Bookmark

View Full Paper