Robust Audio-Visual ASR with Unified Cross-Modal Attention | Synapse