Cloze-driven Pretraining of Self-attention Networks | Synapse