Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition | Synapse