Key points are not available for this paper at this time.
Agricultural AI is often constrained by limited, imbalanced plant image datasets and pronounced domain shift when models trained on controlled indoor imagery are deployed in field conditions. To address these challenges, we propose an integrated diffusion-based framework with three components that can be used independently or as complementary stages: (1) text-conditioned plant image synthesis to expand labeled training data, (2) indoor-to-outdoor image translation to mitigate domain shift, and (3) expert preference-aligned fine-tuning to improve agronomic realism and output stability. Our implementation builds on a Stable Diffusion v1.4 backbone fine-tuned with our domain-specific image dataset, which is then served as the base model for the image-translation module using the DreamBooth strategy. The fine-tuned generative model is further optimized by a reward-weighted mechanism using expert scores to refine image quality. We evaluate the framework using standard generative metrics (IS, FID) and downstream agricultural tasks, including phenotype classification and weed detection with YOLOv8. Results indicate that the components are synergistic: the synthesis model provides a strong initialization for translation, translation improves field realism while retaining utility for data augmentation, and preference alignment further enhances consistency and expert-perceived quality. Overall, the proposed framework offers a practical, data-efficient, and expert-aware generative pipeline for real-world agricultural AI.
Tan et al. (Fri,) studied this question.