wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models | Synapse