Multimodal Cross- and Self-Attention Network for Speech Emotion Recognition | Synapse