Autonomous mobile manipulators operating in unknown environments must tightly couple exploration, perception, and manipulation under strict computational and sensing constraints. This paper presents a fully onboard exploration-to-grasp system that enables a mobile cobot to autonomously search for, detect, and grasp a target object specified by a natural-language prompt without prior maps or object-specific training. The proposed system integrates frontier-based exploration with camera-aware coverage planning to reduce redundant motion and promote informative viewpoints. Open-vocabulary object detection is performed using a lightweight vision-language model optimized for real-time inference on embedded GPU hardware. Upon stable detection, a deterministic detection-to-grasp pipeline computes feasible standoff poses and executes a constrained grasp sequence tailored to the target object geometry. The approach is evaluated in two real-world indoor environments with multiple exploration scenarios. Experimental results demonstrate that frontier-based exploration significantly outperforms a straight-line baseline in terms of execution time, traveled path length, and grasp success, particularly in environments with occlusions and narrow passages. The findings highlight the practical feasibility of integrating open-vocabulary perception and autonomous exploration for reliable mobile manipulation on resource-constrained cyber-physical systems.
Nordhoff et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: