Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection | Synapse