February 27, 2024

A CNN-LSTM based approach for image captioning

Key Points

Key points are not available for this paper at this time.

Abstract

In this age of information and visual data, it has become necessary in many fields to draw meaningful conclusions from these visual data and express them in a textual context. A visual data makes it more understandable and valuable with a textual explanation. It will also greatly accelerate the scanning of large amounts of data in many application areas. In basic fields such as education or medicine, visual-text pairs are of great importance to use as materials. In addition, many autonomous vehicles can be designed that enable disabled individuals to use visual and textual information together. Such studies have gained a wider application area, especially with the advancement of deep learning technology. In this study, a study was carried out on the Flickr8k dataset using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) technologies to title visual data. This model provides an integrated structure for understanding visual data and producing textual descriptions. The accuracy of the caption value created with the BLEU-1 metric was evaluated. In addition, other studies carried out together with this study were discussed and information was given about the performance of these methods.

Bookmark

Cite This Study

Balık et al. (Tue,) studied this question.

synapsesocial.com/papers/68e77692b6db6435876eb6d8 https://doi.org/https://doi.org/10.1049/icp.2024.1019

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark