Advertisement heavily relies on compelling visuals to engage audiences across sectors. Recent advances in AIdriven text-to-image generation, particularly diffusion models like Stable Diffusion, offer novel opportunities for hyper-personalized and context-aware advertising content. However, challenges remain in precise control over image composition, segmentation robustness, and semantic consistency. In this work, we enhance the state-of-the-art Anywhere-Multi-Agent framework by replacing the original RMBG segmentation module with the Segment Anything Model (SAM), integrated via an interactive web interface enabling user-guided mask refinement. We further improve generation fidelity through prompt engineering with language models and explore multiple ControlNet conditioning strategies, including Canny, depth, and their combination modalities. Our experiments demonstrate significant gains in segmentation accuracy, object placement, and background coherence, facilitating flexible and precise image composition suitable for real-world advertising workflows. These modular improvements pave the way for scalable, controllable generative pipelines that better align AI outputs with user intent.
Demirtas et al. (Tue,) studied this question.