What type of study is this?

This is a Quantitative Study study.

September 24, 2025Open Access

OmniD: Generalizable Robot Manipulation Policy via Image-Based BEV Representation

Key Points

OmniD shows an 84% improvement in few-shot experiments for robot manipulation, indicating effective generalization.
The deformable attention-based Omni-Feature Generator enhances feature selection while reducing background noise.
Challenges of existing methods are addressed by synthesizing image observations into a unified bird's-eye view.
OmniD overcomes limitations of conventional visuomotor policies that suffer from overfitting in certain scenarios.

Abstract

The visuomotor policy can easily overfit to its training datasets, such as fixed camera positions and backgrounds. This overfitting makes the policy perform well in the in-distribution scenarios but underperform in the out-of-distribution generalization. Additionally, the existing methods also have difficulty fusing multi-view information to generate an effective 3D representation. To tackle these issues, we propose Omni-Vision Diffusion Policy (OmniD), a multi-view fusion framework that synthesizes image observations into a unified bird's-eye view (BEV) representation. We introduce a deformable attention-based Omni-Feature Generator (OFG) to selectively abstract task-relevant features while suppressing view-specific noise and background distractions. OmniD achieves 11\%, 17\%, and 84\% average improvement over the best baseline model for in-distribution, out-of-distribution, and few-shot experiments, respectively. Training code and simulation benchmark are available: https://github.com/1mather/omnid.git

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper