What question did this study set out to answer?

To develop a model that improves the extraction of essential information from videos while preserving context and visual content.

April 20, 2026Open Access

Deep video summarization using a Correlation Attention model

Key Points

To develop a model that improves the extraction of essential information from videos while preserving context and visual content.
Proposed a Correlation Attention model for semantic video summarization.
Utilized a stacked BiLSTM neural network architecture to enhance learning capabilities.
Conducted experiments on TVSum and SumMe datasets for model testing.
Achieved an accuracy of 0.873 and an F-score of 0.623.
Demonstrated improved summarization performance compared to existing methods.
Average fidelity of 0.968 indicates high reliability of the summaries produced.

Abstract

The ability to condense videos and extract essential information has become increasingly important with the continuous rise of online video content. Video summarization methods strive to condense lengthy videos into concise representations while preserving diverse context and representative visual content. However, existing techniques often struggle to capture the complex temporal dependencies, key moments and contextual details within videos, resulting in suboptimal summarization performance. Moreover, existing methods place insufficient emphasis on selecting contextually relevant keyframes with diverse visual content. To address this limitation, this study proposes a Correlation Attention (CA) model specifically designed for semantic video summarization tasks. The proposed correlation attention strengthens semantic understanding by modeling inter-frame dependencies, allowing the summarizer to focus on frames that contribute most to the overall narrative. Integrating the Correlation Attention model with a stacked Bidirectional Long Short-Term Memory (BiLSTM) neural network architecture demonstrates substantial improvements in summarization performance compared to existing methods across diverse video datasets. Deep Video Summarization (DVS), TVSum and SumMe video datasets are used for training, validation and testing of the model. Quantitative analysis metrics such as accuracy (0.873), F-score (0.623) and average fidelity (0.968) show better performance of the model compared to the state of the art video summarization model. The findings underscore the efficacy of correlation model in enhancing semantic video summarization, promising advancements in video content analysis, retrieval, compression and applications requiring efficient video understanding. This research has potential benefits for a wide range of stakeholders, including web users, information seekers, content creators and educational institutions. • A Correlation Attention model for generating contextually meaningful video summaries. • A hybrid architecture that learn spatio-temporal features and long-term dependency. • Experiments on TVSum and SumMe datasets show superior video summary results.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

K.C. et al. (Fri,) studied this question.

synapsesocial.com/papers/69e5c2d003c2939914028ca0 https://doi.org/https://doi.org/10.1016/j.array.2026.100814

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper