What type of study is this?

September 10, 2025Open Access

Domain-Aware Reinforcement Learning for Prompt Optimization

Key Points

DA-RLPO enhances prompt editing through structured domain knowledge, improving accuracy in large language models.
Experimental results indicated that DA-RLPO outperformed baseline methods in text classification tasks with limited API queries.
Domain-Aware Reinforcement Learning optimizes prompt editing as a sequential decision process, constraining candidate edits effectively.
Results show robust performance not only in text classification but also in text-to-image and reasoning tasks.

Abstract

Prompt engineering provides an efficient way to adapt large language models (LLMs) to downstream tasks without retraining model parameters. However, designing effective prompts can be challenging, especially when model gradients are unavailable and human expertise is required. Existing automated methods based on gradient optimization or heuristic search exhibit inherent limitations under black box or limited-query conditions. We propose Domain-Aware Reinforcement Learning for Prompt Optimization (DA-RLPO), which treats prompt editing as a sequential decision process and leverages structured domain knowledge to constrain candidate edits. Our experimental results show that DA-RLPO achieves higher accuracy than baselines on text classification tasks and maintains robust performance with limited API calls, while also demonstrating effectiveness on text-to-image and reasoning tasks.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper