In order to solve the problem of how to integrate visual content and semantic information into oil paintings well, this paper puts forward an emotion recognition model for oil paintings based on a multimodal adaptive deep network.Visual and textual information are handled with a two-path system in the model; it gets deep visual features out of paintings and contextual semantic features from connected texts.Adaptive feature fusion module is created to adaptively adjust the fusion weights of different modality features by using cross-modal attention and gating mechanisms.On the ArtEmis oil painting dataset, the experiment shows that the proposed model has achieved 76.8% accuracy in discrete emotion classification task and 0.319 RMSE in continuous emotion dimension prediction.Compared with the basic model, it has better classification accuracy, which proves the validity of the adaptive fusion mechanism in the analysis of multimodal art emotions.
Guixiang Chang (Thu,) studied this question.