Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention | Synapse