Multi-Modal Emotion Recognition by Text, Speech and Video Using Pretrained Transformers | Synapse