Multi-Modality Speech Recognition Driven by Background Visual Scenes | Synapse