Are vision transformers replacing convolutional neural networks in scene interpretation?: A review | Synapse