August 12, 2024Open Access

A Survey on Referring Image Segmentation

Key Points

Key points are not available for this paper at this time.

Abstract

With the popularity of artificial intelligence models and the increasing expectation of artificial intelligence applications in many fields, reference image segmentation (RIS) has attracted much attention from researchers. RIS, as one of the most basic and challenging visual language cross-modal tasks in the intersection of computer vision and natural language processing, aims to segment an instance from an image corresponding to a given natural language representation. This paper aims to provide an overview as comprehensive as possible, covering the mainstream benchmark datasets and their statistic information, common evaluation metrics, a few crucial and representative works in RIS, and the performance evaluation of each proposed method. Included RIS methods are elaborated with their core model structure and procedure in performing RIS, and are categorized into 5 classes in this paper based on how multimodal information is processed. At the end of this paper, the author makes a brief expectation of possible future expansions on the research of RIS.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper

Cite This Study

Honglin Wang (Mon,) studied this question.

synapsesocial.com/papers/68e5ca68b6db64358756082d https://doi.org/https://doi.org/10.62051/a2t2ec16

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper