Answering Diverse Questions via Text Attached with Key Audio-Visual Clues | Synapse