A siamese vision transformer-based model for automatic music emotion annotation and classification | Synapse