What question did this study set out to answer?

The research aims to improve robotic vision by enhancing object detection adaptability using prompt tuning and retrieval methods.

March 25, 2026Open Access

Prompt Tuning and Retrieval in Open-Vocabulary Object Detection for Configurable Robot Vision

Key Points

The research aims to improve robotic vision by enhancing object detection adaptability using prompt tuning and retrieval methods.
Proposed a coupling of prompt tuning with a vector database for prompt retrieval.
Adapted the Grounding DINO model for implementation.
Verified effectiveness using the LVIS benchmark for evaluation.
Showed that prompt tuning enhances performance without significantly impacting previous model capabilities.
Demonstrated faster adaptability in low-volume high-mixture scenarios.

Abstract

Robotic vision modules in low-volume high-mixture scenarios frequently need to be adapted to new requirements, but retraining or fine-tuning well-established object detection models may be too slow and resource intensive. Open-vocabulary object detection is a promising alternative, and fine tuning the prompt embeddings can solve situations where text prompts are not sufficient. We propose coupling this so-called prompt tuning with a vector database for retrieving the best prompts to differentiate challenging scenarios. This enables iterative improvement without much impact on the model’s performance with respect to previous requirements. We implemented the proposed method by adapting Grounding DINO and experimentally verified its effectiveness using the LVIS benchmark.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Fixl et al. (Thu,) studied this question.

synapsesocial.com/papers/69c37bd4b34aaaeb1a67e9a5 https://doi.org/https://doi.org/10.1016/j.procs.2026.02.133

Bookmark

View Full Paper