USEV: Universal Speaker Extraction With Visual Cue | Synapse