We introduce GUing, a GUI search engine that allows developers to retrieve relevant app designs using natural language queries. This system is powered by GUIClip, a specialized vision-language model trained on a novel dataset of 135k screenshots and captions extracted from Google Play. By integrating both visual and textual data, this approach significantly outperforms traditional text-only models, delivering high accuracy for text-to-GUI retrieval.
Wei et al. (Thu,) studied this question.