Video Question Answering via Hierarchical Spatio-Temporal Attention Networks | Synapse