Exploring the Optimization of RLHF and its Variants in Aligning Large Models with Human Preferences | Synapse