Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering | Synapse