Contact-rich manipulation in industrial robotics faces significant challenges in skill transfer where conventional vision-based systems rely on indirect force inference from visual observations. To address this limitation, this study developed a vision-force fusion framework combining visual and haptic measurements through learned attention mechanisms to generate manipulation trajectories satisfying contact constraints. The approach employs diffusion-based iterative refinement conditioned on multimodal observations from RGB-D cameras and force sensors. Experimental validation employed 387 demonstrations from diverse contact-rich assembly tasks using a collaborative robot with multimodal perception. The approach achieved 87.3% success rate across target task variants, outperforming Diffusion Policy (78.6%), Action Chunking Transformer (73.9%), and Behavior Cloning (62.3%), representing a 25 percentage point improvement over the baseline. Ablation studies confirmed the necessity of multimodal fusion, with vision-only achieving 67.8% and force-only 59.1%. Attention-based fusion demonstrated 8.2 percentage points higher performance than linear weight combinations while maintaining force tracking root mean square error (RMSE) of 1.38N. Cross-task generalization experiments revealed consistent performance above 86% across geometric variations. Robustness evaluation under sensor degradation maintained 78.4% success with force noise and 75.8% with vision impairment, while achieving 3.09-second inference time suitable for real-time control. These results establish that explicit integration of haptic measurements addresses limitations in vision-based force estimation, enabling more precise contact regulation for industrial assembly operations with tight tolerances and variable component geometries.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hongliang Xing
Yuhang Zhao
Journal of Advanced Manufacturing Systems
Building similarity graph...
Analyzing shared references across papers
Loading...
Xing et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69fa8ef304f884e66b5314a4 — DOI: https://doi.org/10.1142/s0219686728500011
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: