What question did this study set out to answer?

April 25, 2026

Text-guided product image editing based on multimodal feature fusion

Puntos clave

This research aims to enhance the consistency and semantic similarity of product image editing through multimodal feature fusion guided by text.
Extracted shape features using Hu moments, described texture characteristics with a grey-level co-occurrence matrix, and detected edges using the Canny algorithm.
Integrated image features with target text information via a dual attention mechanism for multimodal feature fusion.
Utilized a generative adversarial network model to achieve text-guided product image editing with fused features.
Achieved a multimodal consistency coefficient of 0.98.
Demonstrated a visual semantic similarity of 0.990.

Resumen

In order to improve the multimodal consistency and semantic similarity of product image editing results, a text guided product image editing method based on multimodal feature fusion is proposed. Firstly, shape features are extracted through Hu moments, texture characteristics are described with a grey-level co-occurrence matrix, and edge features are detected via the Canny algorithm. Secondly, image features including shape, texture, and edges are integrated with target text information using a dual attention mechanism, thereby achieving multimodal feature fusion. Finally, text guided product image editing is achieved by employing a generative adversarial network model and combining the feature fusion results of target text with existing images. The experimental results demonstrate that a multimodal consistency coefficient of 0.98 and a visual semantic similarity of 0.990 can be achieved by the proposed method.

Me gusta

Guardar

Me gusta

Guardar

Text-guided product image editing based on multimodal feature fusion

Puntos clave

Resumen

Cite This Study