Prompt engineering provides an efficient way to adapt large language models (LLMs) to downstream tasks without retraining model parameters. However, designing effective prompts can be challenging, especially when model gradients are unavailable and human expertise is required. Existing automated methods based on gradient optimization or heuristic search exhibit inherent limitations under black box or limited-query conditions. We propose Domain-Aware Reinforcement Learning for Prompt Optimization (DA-RLPO), which treats prompt editing as a sequential decision process and leverages structured domain knowledge to constrain candidate edits. Our experimental results show that DA-RLPO achieves higher accuracy than baselines on text classification tasks and maintains robust performance with limited API calls, while also demonstrating effectiveness on text-to-image and reasoning tasks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mengqi Gao
Bowen Sun
Tong Wang
Mathematics
Durham University
China Jiliang University
Shanghai Polytechnic University
Building similarity graph...
Analyzing shared references across papers
Loading...
Gao et al. (Sat,) studied this question.
www.synapsesocial.com/papers/68c1c23d54b1d3bfb60efd7f — DOI: https://doi.org/10.3390/math13162552
Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context: