Policy Learning from Large Vision-Language Model Feedback without Reward Modeling | Synapse