Focus on the Optimization of the RLHF Algorithm to Enhance the Training Effect After LLM | Synapse