What question did this study set out to answer?

The study aims to develop a specialized Questioner model for chat-based image retrieval by generating effective questions for user input.

April 17, 2026

A Bootstrap Pipeline for Chat-Based Image Retrieval with Effective Question Generation

Puntos clave

The study aims to develop a specialized Questioner model for chat-based image retrieval by generating effective questions for user input.
Developed two principles for effective question generation
Introduced a bootstrap training methodology for data collection
Trained both the Questioner model and image Retriever concurrently
Established a fair protocol for model comparison
Successfully addressed the data gap in dialog-to-image retrieval
Achieved state-of-the-art performance, surpassing GPT-4o and GPT-4-Turbo
Demonstrated effective question generation improves image retrieval accuracy

Resumen

Chat-based image retrieval uses Large Language Models (LLM) to guide user input to enable more specific and precise search results, where LLM can enhance this process by asking user retrieval-oriented questions eliciting additional details about the target image. Despite the potential of this approach, no specialized Questioner model has been developed for this task due to the following significant challenges: (a) the difficulty of determining the optimal questions to ask; (b) the lack of a suitable protocol for fair model comparison; and (c) the notable scarcity of dialog-to-image retrieval data. To address these challenges, two fundamental principles are developed in this paper to ensure the simplicity and effectiveness of the generated questions while enabling a fair comparison and accurate estimation of data quality and model performance. A bootstrap training methodology is introduced to collect retrieval-oriented dialog data and concurrently train the Questioner and the image Retriever. Under a fair comparison protocol, our extensive experiments have demonstrated that our proposed method can not only address the critical data gap, but also achieve state-of-the-art results, which substantially surpass GPT-4o and GPT-4-Turbo through the fine-tuning of an 8B model.

Preguntar a la IA

Me gusta

Guardar

Cite This Study

Chen et al. (Wed,) studied this question.

synapsesocial.com/papers/69e1ceaa5cdc762e9d857a62 https://doi.org/https://doi.org/10.1145/3807947

Preguntar a la IA

Me gusta

Guardar