What question did this study set out to answer?

The aim is to develop a robust algorithm for predicting human and robot interactions in real-time.

April 1, 2026

Research on interactive behavior prediction algorithm of human-robot collaboration based on generative large model

Key Points

The aim is to develop a robust algorithm for predicting human and robot interactions in real-time.
Proposed a framework based on generative large model for behavior prediction.
Employed Transformer architecture to model multimodal information including vision and language.
Used contrastive learning for prediction representation and proximal policy optimization for strategy tuning.
Implemented attention visualization and Monte Carlo Dropout for decision-making interpretability.
Reduced average displacement error by 25.6% and final displacement error by 22.6% compared to the optimal baseline.
Improved multimodal accuracy by 28.8%.
Demonstrated online adaptability to sudden interferences, showing enhanced accuracy and robustness.

Abstract

Facing the urgent need of Human-Robot Collaboration (HRC) for real-time and accurate interaction behavior prediction, this paper proposes a spatio-temporal multimodal prediction framework based on generative large model. This method regards human-computer interaction as a sequence generation task, uses the Transformer backbone network to jointly model multimodal information such as vision, language and force/joint state, and generates future human and robot behavior sequences through autoregressive multi-head space-time cross-attention mechanism. To enhance dynamic adaptability, contrastive learning reinforcement prediction representation discrimination is introduced, combined with proximal policy optimization (PPO) to fine tune the strategy network online with prediction error as a reward; Simultaneously utilizing attention weight visualization and Monte Carlo Dropout uncertainty quantification to achieve interpretable decision-making processes and controllable risks. The experiments on the public dataset HRI Interaction and the self built simulation environment HRC Sim show that the proposed method reduces the average displacement error (ADE), final displacement error (FDE), and multimodal accuracy (MM Acc) by 25.6%, 22.6%, and improves by 28.8% compared to the existing optimal baseline, respectively. It can also quickly correct online under sudden interference, verifying its comprehensive advantages in accuracy, robustness, and interpretability.

Mark Helpful

Bookmark

Relay