The evolution of object detection from CNNs to transformers and multi-modal fusion | Synapse