Traditional methods, which rely on retraining pre-trained models with specific datasets, are impractical in dynamic retail environments where product offerings and designs frequently change. As such, this paper introduces a novel Generative AI pipeline for object recognition and localization, designed to maintain an accurate digital twin of supermarket shelves. The proposed pipeline integrates key components, including shelf filtering, bounding box detection, product description extraction, text recognition, and product identification. By leveraging generative AI, the system extracts visual and textual information from shelf images, enabling accurate product identification without constant model retraining. This modular architecture enhances scalability and integrates seamlessly with a Digital Twin framework, utilizing Eclipse Ditto for periodic or batch-based data management. The results demonstrate the effectiveness of this approach, achieving approximately 85% product identification accuracy in the simulated retail environment, while accurately identifying products within individual shelf cells and contributing to the maintenance of an up-to-date digital twin representation. This research not only addresses challenges in retail but also highlights potential applications in other domains.
Teixeira et al. (Sun,) studied this question.