Joint Spatio-Temporal-Frequency Representation Learning for Improved Sound Event Localization and Detection | Synapse