What type of study is this?

This is a Quantitative Study study.

October 3, 2025Open Access

Customizable Length Constrained Image-Text Summarization via Knapsack Optimization

Key Points

The method achieves a ROUGE-1 score of 40.52, demonstrating superior summarization quality compared to traditional methods.
By utilizing knapsack optimization and deep learning, the approach adheres closely to user-defined length constraints.
The incorporation of a multimodal attention mechanism ensures a balanced integration of text and visual information.
Experimental results confirm that the approach maintains the lowest length variance, enhancing overall cohesiveness.

Abstract

With the proliferation of multimedia data, controllable summarization generation has become a key focus in Artificial Intelligence Content Generation. However, many traditional methods lack precise control over output length, often resulting in summaries that are either too verbose or too brief, thus failing to meet diverse user needs. In this paper, we propose a length-customizable approach for multimodal image-text summarization. Our method integrates combinatorial optimization with deep learning to address the length-control challenge. Specifically, we formulate the summarization task as a knapsack optimization problem, enhanced by a greedy algorithm to strictly adhere to user-defined length constraints. Additionally, we introduce a multimodal attention mechanism to ensure balanced and coherent integration of textual and visual information. To further enhance semantic alignment, we employ a cross-modal matching strategy for image selection based on pre-trained vision-language models. Experimental evaluations on the MSMO dataset and validate against baselines like LEAD-3, Seq2Seq, Attention, and Transformer that our method achieves a ROUGE-1 score of 40.52, ROUGE-2 of 16.07, and ROUGE-L of 35.15, outperforming existing length-controllable baselines. Moreover, our approach attains the lowest length variance, confirming its precise adherence to target summary lengths. These results validate the effectiveness of our method in generating high-quality, length-constrained multimodal summaries.

Customizable Length Constrained Image-Text Summarization via Knapsack Optimization

Key Points

Abstract

Cite This Study

Also Consider

Also Consider