Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions Through Masked Modeling | Synapse