Automating packaging layout design requires balancing aesthetic appeal, semantic constraints, and personalised needs.This paper presents a human-computer collaborative reinforcement learning for packaging design framework that generates packaging layouts by modelling the task as a Markov decision process.Our approach incorporates a dual-stream policy network for global composition planning and local element adjustment, alongside a joint reward function that integrates aesthetic, semantic, and human-preference signals.Training follows a two-stage strategy: pre-training with automated rewards, then fine-tuning via online human feedback.Experiments on a dataset of 10,000 packaging samples show that design framework outperforms existing methods in layout rationality, aesthetic score, rule compliance, and brand prominence.In user studies, professional designers rated our layouts higher for visual appeal and information clarity.This work offers a practical human-in-the-loop solution for automated creative design under complex constraints.
Xaoshan Chen (Thu,) studied this question.