With the increasing parameterization of diffusion-based image generation models, the scope of prompts that can be processed has expanded, result- ing in more diverse and complex generation tasks. However, this growth introduces challenges related to attention distribution. Even with en- hanced generative capabilities, the model’s attention mechanism may be- come dispersed across a wider range of information, hindering its ability to focus on specific task details. For instance, when a prompt contains mul- tiple elements, the model may lose focus, leading to missing details and a decrease in image quality. We propose a reinforcement learning-based image generation optimization framework that incorporates an aesthetic feedback mechanism. By utilizing Token-Level policy gradient control and assigning aesthetic weights through an aesthetic model, this frame- work guides the model’s attention to focus on the target details, thereby improving image quality. We refer to this method as Token-Level Prox- imal Policy Optimization (TokenPPO). We demonstrate that, through the application of TokenPPO, the aesthetic scores, human satisfaction, and other evaluation metrics of the generated images show significant im- provement.
Zero Tendou (Tue,) studied this question.