Los puntos clave no están disponibles para este artículo en este momento.
Music has the ability to invest even the tritest scenes with so much meaning when added to them. Human perceptions of music and image can be closely related to each other, as both can incite similar sensations and emotions. Advertising agencies often make use of audio and music over their visuals to engage more audiences and to convey the emotions associated with their content more effectively. Matching visuals and music to comparable feelings might help people perceive emotions more vividly and strongly. This paper proposes an effective cross-modal neural network that provides music recommendations to a user by generating matches between images and music over a common emotional vector space. Using the valence and arousal values, a combined image-music pair dataset has been created. The images incorporated in this dataset are leveraged from the OASIS dataset while the music part is queried using Spotify API and YouTube. A Transfer Learning approach is proposed with Convolution Neural Network architecture for training on this dataset using MobileNetV3, ResNet-18 and EfficientNetB4 for the images and SampleCNN for the raw audio clips. For any given image input, a list of top-n music recommendations shall be outputted. This concept thus aims to generate music and image matching based on various deep hidden features over the emotion space of the two modalities.
Chheda et al. (Sun,) studied this question.
Synapse has enriched 2 closely related papers on similar clinical questions. Consider them for comparative context: