Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers | Synapse