Grasping assistance is essential for restoring autonomy in individuals with motor impairments, particularly in unstructured environments where object categories and user intentions are diverse and unpredictable. We present OVGrasp , a hierarchical control framework for grasp assistance that integrates RGB-D vision, open vocabulary prompts, and voice commands to enable robust multimodal interaction. To enhance generalisation in open environments, OVGrasp incorporates a vision language foundation model with an open vocabulary mechanism, which enables zero-shot detection of previously unseen objects without retraining. A multimodal decision maker further fuses spatial and linguistic cues to infer user intent, such as grasp or release, in situations involving multiple objects. We deploy the complete framework on a custom egocentric view wearable exoskeleton and conduct systematic evaluations on fifteen objects across three grasp types. Experimental results with ten participants show that OVGrasp achieves a grasping ability score (GAS) of 87.00%, surpassing existing baselines and providing improved kinematic alignment with natural hand movement. • OVGrasp: a hierarchical framework for grasp assistance. • Open-vocabulary detection enables zero-shot generalisation to unseen objects. • Multimodal decision-making fuses vision, depth, and speech for intent detection. • Integrated in a soft hand exoskeleton with egocentric RGB-D sensing. • Achieves superior grasping ability score and improved joint kinematics in tests.
Hu et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: