Los puntos clave no están disponibles para este artículo en este momento.
For visually impaired individuals, image captioning is a crucial task that utilizes deep learning models to recognize an image and generate a descriptive sentence, enabling them to understand the content of the image through words. However, the existing image captioning model needs a lot of manual annotation. Fortunately, the emergence of unsupervised methods provides a new approach to image captioning. Our proposed model Fast RF-UIC achieves unsupervised functionality through the designed Pre-trainer. Compared with the existing pre-trained model, the Pre-trainer has a faster and shorter training cycle. The R2-Inception-V4 model is designed as an encoder that fuse the Res2Net structure with Inception-V4 to obtain more image features. Bi-FGRU is designed as the decoder, which the FReLU activation function is used to improve the character representation ability from two-dimensional space. Furthermore, we expanded the corpus used in existing unsupervised image captioning and included additional captions for common objects, effectively enhancing the model’s generalization ability. Through experiments, Fast RF-UIC achieved higher scores than existing unsupervised image captioning methods on several text evaluation metrics such as BLUE, ROUGE, and CIDEr.
Building similarity graph...
Analyzing shared references across papers
Loading...
Rui Yang
Tianjin University
Xiayu Cui
Qinzhi Qin
Displays
Guilin University of Electronic Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Yang et al. (Mon,) studied this question.
synapsesocial.com/papers/6a11f9300514fa642cccd21d — DOI: https://doi.org/10.1016/j.displa.2023.102490