End-to-End Video Object Detection with Spatial-Temporal Transformers | Synapse