Harnessing Representative Spatial-Temporal Information for Video Question Answering | Synapse