We introduce a reinforcement learning (RL)-based framework to optimize discrete natural language prompts for enhancing both the accuracy and clarity in sentence simplification. Using a lightweight PPO policy, our method learns to guide a frozen small-scale LLaMA-3.2B model toward effective simplification for supporting user-centric computational thinking tasks. Results show that our RL-optimized prompts significantly surpass manual baselines in semantic fidelity, logical coherence, and instructional quality. Moreover, the proposed RL-optimized prompting approach enables a much smaller LLM to achieve results that are comparable in clarity and instructional value to those produced by a much larger LLaMA-3.3 70B model.
Bhatt et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: