December 5, 2025

ChitraVivran: Real-Time Attention-Based Hindi Image Captioning with Boosted Contextual Descriptors

Key Points

ChitraVivran achieves improved results in Hindi image captioning, emphasizing real-time capabilities.
Performance metrics indicate enhancements in Bilingual Evaluation Understudy scores, reflecting strong qualitative results.
This framework utilizes advanced attention mechanisms and visual feature extraction for better embedding fusion.
The integration of Hindi captioning in vision aid tools shows promise, supporting multi-captioning functionality.

Abstract

Image captioning is the ability to generate concise natural language descriptions of given images. It integrates computer vision and natural language processing, two cutting-edge artificial intelligence disciplines. Image captioning is nowadays widely used in vision assistance, healthcare, remote sensing, and security. While English image captioning has advanced across domains, Hindi image captioning remains underdeveloped, lacking features like autonomy, ensemble extraction, hybrid attention, and vision-boosted decoding. Additionally, integrating Hindi image captioning into vision aid tools is infeasible due to the lack of real-time and multi captioning ability. This research introduces ChitraVivran, a novel real-time, end-to-end Hindi image captioning framework. Our framework employs an ensemble visual feature extraction module to generate boosted contextual descriptors, enriching the fusion of visual and semantic embeddings. A dataset named PASCAL 1K-Hindi has also been manually created by translating the PASCAL 1K-English image captioning dataset into Hindi. Various pipeline configurations, confining ensemble feature extractors, attention mechanisms, and decoders, have also been developed and tested for Hindi image captioning. To enhance the applicability of Hindi image captioning in vision aid tools, our framework also incorporates real-time captioning and customized multi-captioning support. Experimental analysis on the Flickr 8K-Hindi and our newly developed PASCAL 1K-Hindi dataset indicates that ChitraVivran produces improved quantitative (Bilingual Evaluation Understudy-3(26.73%), Bilingual Evaluation Understudy-4(16.82%)) and qualitative results against baselines. Our framework demonstrates high performance in real-time captioning.

KI fragen

Bookmark

KI fragen

Bookmark

ChitraVivran: Real-Time Attention-Based Hindi Image Captioning with Boosted Contextual Descriptors

Key Points

Abstract

Cite This Study