Vision-language alignment with sigmoid loss and dual-token contrastive change localizer for precise change captioning | Synapse