Key points are not available for this paper at this time.
With the prevalence of more diverse and multiform user-generated content in social networking sites, multimodal sentiment analysis has become an increasingly important research topic in recent years. Previous work on multimodal sentiment analysis directly extracts feature representation of each modality and fuse these features for classification. Consequently, some detailed semantic information for sentiment analysis and the correlation between image and text have been ignored. In this paper, we propose a deep semantic network, namely MultiSentiNet, for multimodal sentiment analysis. We first identify object and scene as salient detectors to extract deep semantic features of images. We then propose a visual feature guided attention LSTM model to extract words that are important to understand the sentiment of whole tweet and aggregate the representation of those informative words with visual semantic features, object and scene. The experiments on two public available sentiment datasets verify the effectiveness of our MultiSentiNet model and show that our extracted semantic features demonstrate high correlations with human sentiments.
Building similarity graph...
Analyzing shared references across papers
Loading...
Nan Xu
Wenji Mao
University of Chinese Academy of Sciences
Institute of Automation
Building similarity graph...
Analyzing shared references across papers
Loading...
Xu et al. (Mon,) studied this question.
www.synapsesocial.com/papers/6a0f3f99f7e1df59726c9c8d — DOI: https://doi.org/10.1145/3132847.3133142