Food image generation is an important research direction in food computing, aiming to produce highly realistic images that accurately capture the visual characteristics of various dishes while adhering to specified input conditions. Existing methods that rely solely on textual descriptions struggle to handle the large intra-class variability of food, often resulting in limited diversity and accuracy. Although some approaches incorporate additional conditions, they generally lack optimizations for food-specific challenges, leading to inconsistencies in texture, shape, and color fidelity. To address these limitations, we propose Cond-FoodGen, a diffusion-based two-stream network for controllable food image generation. The architecture consists of a control stream and a generation stream, where the control stream provides conditional guidance to regulate the generation process. To optimize bidirectional interactions between the two streams, we introduce the Bidirectional Adaptive Gating (BAG) mechanism, which not only guides synthesis but also adaptively refines control representations through feedback from the generation stream. In addition, we propose the Wavelet-Guided Hierarchical Attention (WGHA) module, which combines wavelet-based multi-frequency analysis with hierarchical attention to enhance fine-grained texture fidelity and structural realism. A progressive multi-stage training strategy further stabilizes optimization and enables seamless integration of conditional guidance with bidirectional interaction. Extensive experiments on three food image datasets demonstrate that CondFoodGen consistently generates high-quality and diverse images. Compared with the best existing food image generation methods, our approach achieves an average improvement of about 11.0% across three evaluation metrics and compared to the leading conditional generation approaches, the average improvement reaches 16.2%. The source code, trained models, and supplementary materials are publicly available at https://github.com/housujuan123/CondFoodGen.
Zhao et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: