Multimodal Fusion of Music Theory-Inspired and Self-Supervised Representations for Improved Emotion Recognition | Synapse