What question did this study set out to answer?

The goal is to improve text-to-image generation by addressing issues with single reward functions.

February 11, 2026Open Access

Flow-Multi: A Flow-Matching Multi-Reward Framework for Text-to-Image Generation

Key Points

The goal is to improve text-to-image generation by addressing issues with single reward functions.
Developed Flow-Multi framework based on flow-matching and GRPO learning.
Evaluated samples using four reward models: text-to-image alignment, human preference, aesthetic quality, and GenEval.
Applied Pareto dominance to filter non-dominated samples for policy updates.
Introduced advantage masking to amplify contributions from high-reward samples.
Flow-Multi showed balanced improvements across multiple reward criteria.
Demonstrated effectiveness compared to existing Flow-GRPO in ensuring stable alignment in outputs.
Substantial reduction in issues related to reward hacking and optimization imbalance.

Abstract

Recent approaches in text-to-image (T2I) generation have actively adopted reinforcement learning (RL) techniques for human preference alignment. However, existing approaches primarily rely on a single reward function, which can lead to overfitting on specific metrics, resulting in issues such as reward hacking and imbalanced optimization among multiple objectives. To address this, we propose Flow-Multi: a flow-matching multi-reward framework for text-to-image generation. Our method builds upon flow-matching-based group-relative policy optimization (GRPO) learning. Each sample is evaluated by four reward models—based on text-to-image alignment, human preference, aesthetic quality, and GenEval—to create a multi-dimensional reward vector. We then utilize the Pareto dominance relationship to remove dominated samples and update the policy using only the non-dominated set. Additionally, we introduce advantage masking during training to suppress the contribution of low-reward samples, ensuring that only high-quality rewards are reflected in policy optimization. Experimental results demonstrate that Flow-Multi achieves balanced improvements across multiple reward criteria compared to the existing Flow-GRPO, validating the effectiveness of the multi-reward reinforcement learning framework for stable alignment in text-to-image generation.

KI fragen

Bookmark

View Full Paper