Key points are not available for this paper at this time.
Text-to-image (T2I) diffusion models have become popular in computer vision, but they remain vulnerable to backdoor attacks. Existing methods typically trigger a fixed image regardless of user input, causing severe semantic inconsistency between the generated image and the original prompt. This makes the attack easily detectable by machines as it would lack visual stealth. To overcome this challenge, we propose MultiAttack, a novel semantic-preserving multi-object coexistence backdoor attack for T2I diffusion models, which retains prompt-described objects while injecting malicious targets. First, we propose a semantic-preserving data poisoning strategy to build a latent mapping, which maps the trigger into a composite semantic space while retaining the original prompt context. Second, we design a backdoor enhancement mechanism to embed the spatial orthogonality between malicious and benign objects into model weights as a conditional response, which strengthens the model’s ability to generate stable malicious outputs without requiring additional inference. Results on Stable Diffusion show that compared tostate-of-the-art baselines, MultiAttack increases attack success rate by 13.1% and visual stealth (defined as the success rate of co-generating both prompt-described and target objects) by 12.6%, with an FID increase of less than 1.2 and a CLIP score decrease of under 1 compared to clean models.
Zhoufan et al. (Thu,) studied this question.