Key points are not available for this paper at this time.
In this age of information and visual data, it has become necessary in many fields to draw meaningful conclusions from these visual data and express them in a textual context. A visual data makes it more understandable and valuable with a textual explanation. It will also greatly accelerate the scanning of large amounts of data in many application areas. In basic fields such as education or medicine, visual-text pairs are of great importance to use as materials. In addition, many autonomous vehicles can be designed that enable disabled individuals to use visual and textual information together. Such studies have gained a wider application area, especially with the advancement of deep learning technology. In this study, a study was carried out on the Flickr8k dataset using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) technologies to title visual data. This model provides an integrated structure for understanding visual data and producing textual descriptions. The accuracy of the caption value created with the BLEU-1 metric was evaluated. In addition, other studies carried out together with this study were discussed and information was given about the performance of these methods.
Balık et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: