Scalable Audio-Visual Masked Autoencoders for Efficient Affective Video Facial Analysis | Synapse