Use large language model to enhance reasoning of another large language model through reward updated GRPO | Synapse