Key points are not available for this paper at this time.
Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose Maniwhere, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning approach fused with Spatial Transformer Network (STN) module to capture shared semantic information and correspondences among different viewpoints. In addition, we employ a curriculum-based randomization and augmentation approach to stabilize the RL training process and strengthen the visual generalization ability. To exhibit the effectiveness of Maniwhere, we meticulously design 8 tasks encompassing articulate objects, bi-manual, and dexterous hand manipulation tasks, demonstrating Maniwhere's strong visual generalization and sim2real transfer abilities across 3 hardware platforms. Our experiments show that Maniwhere significantly outperforms existing state-of-the-art methods. Videos are provided at https: //gemcollector. github. io/maniwhere/.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yuan et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e5f93bb6db64358758d731 — DOI: https://doi.org/10.48550/arxiv.2407.15815
Zhecheng Yuan
Tianming Wei
Shuiqi Cheng
Building similarity graph...
Analyzing shared references across papers
Loading...