The success of virtual reality (VR) agents is not solely defined by task completion; adaptability and the ability to move beyond repetitive, robotic behavior are equally critical for embodiment, social interaction, rehabilitation, and training. To address this challenge, we propose a hybrid training methodology that integrates imitation learning and reinforcement learning to develop agents that both master tasks and generalize beyond demonstrations. Our framework combines Behavioral Cloning (BC) for rapid policy initialization, Proximal Policy Optimization (PPO) for stable reinforcement-driven learning, and Generative Adversarial Imitation Learning (GAIL) for imitation-based rewards, resulting in policies that converge efficiently while avoiding repetitive imitation. We implemented this framework in a Unity-based VR environment, where participants performed a single-joint cube pick-and-place task using a headset and controller. Human demonstrations were collected and used to train and evaluate agents against multiple motion metrics, including Dynamic Time Warping (DTW), Fréchet distance, minimum jerk cost, Fitts’s law, and curvature–velocity coupling. Experimental results showed that hybrid-trained agents achieved stable convergence, consistent task mastery, and motion trajectories comparable to human demonstrations across temporal, spatial, and smoothness dimensions. Notably, the agents exhibited adaptive, non-repetitive behavior without explicitly optimizing for biomechanical fidelity, suggesting that human-like variability can emerge naturally from our customized hybrid IL+RL training. These findings provide a foundation for scaling the methodology toward multi-joint and full-body motion learning, with promising implications for future VR applications in rehabilitation, training, and embodied interaction.
Sobhi et al. (Thu,) studied this question.