Mask Attention Networks: Rethinking and Strengthen Transformer | Synapse