Los puntos clave no están disponibles para este artículo en este momento.
This paper has compared various deep learning models for generating caption of images gathered from Flickr 8k Dataset. Also, this research work attempts to combine a CNN type encoder for extracting features from images and a Recurrent Neural Network for generating caption for the extracted features. The CNN encoders used are VGG16 and InceptionV3. The extracted features are then passed to a unidirectional or a bidirectional LSTM for generating captions. The proposed model has used beam search as well as greedy algorithms to generate captions from vocabulary. The generated captions are then compared with actual captions with the help of BLEU scores. The Bilingual Evaluation Understudy score (BLEU) is used to compare how close a given sentence is to another sentence. The BLEU score of captions generated using beam search as well as greedy algorithms are analyzed and compared to see which is better.
Takkar et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: