Conditional selection with CNN augmented transformer for multimodal affective analysis | Synapse