What question did this study set out to answer?

The aim is to develop a unified model for analyzing product image and text data in e-commerce to improve user experience and decision making.

June 15, 2026Open Access

Multi-modal e-commerce data analysis system based on deep learning: visual perception and emotional computing

Key Points

The aim is to develop a unified model for analyzing product image and text data in e-commerce to improve user experience and decision making.
Developed a multi-modal hierarchical collaborative fusion model (MHCFM) for integrating visual and textual data.
Utilized a hierarchical visual transformer and a dual-branch aesthetic network for analysis.
Tested on large-scale e-commerce datasets to evaluate performance in sentiment analysis and inference time.
Achieved sentiment analysis accuracy exceeding 93% and an inference time of 22-23 ms.
Average accuracy of 91.5% in cross-cultural and multi-category tests, demonstrating the model's robustness across different contexts.

Abstract

E-commerce platforms generate vast multi-modal data (product images and user reviews), whose integrated analysis is crucial for enhancing user experience and decision making.However, existing methods often treat visual perception and text sentiment analysis separately, limiting cross-modal semantic collaboration.Therefore, a multi-modal hierarchical collaborative fusion model (MHCFM) that unifies product visual attributes, aesthetic quality, scene context, and textual emotion is proposed via cross-modal alignment and hierarchical adaptive fusion.The model integrates a hierarchical visual transformer, a dual-branch aesthetic network, a graph convolutional scene module, and a hierarchical adaptive fusion network.Experiments on public and large-scale e-commerce datasets showed the sentiment analysis accuracy exceeded 93%, the inference time was 22-23 ms, outperforming mainstream models.In cross-cultural and multi-category tests, the average accuracy was 91.5%, demonstrating robustness.The proposed model enhances visual-textual collaboration, offering an efficient solution for intelligent product analysis and user experience optimisation in e-commerce.

KI fragen

Bookmark

View Full Paper