This paper proposes a multi-algorithm ensemble deep reinforcement learning framework for optimizing multi-channel advertising budgets in dynamic digital environments. Budget allocation across six advertising channels is formulated as a sequential decision problem with simplex-constrained actions. Recurrent actor–critic agents with Dirichlet exploration and renormalization preserve budget feasibility throughout training. Three modern actor–critic algorithms (DDPG, TD3, and SAC) are optimized using Optuna-based hyperparameter search (200 trials per configuration) and trained for 1,000 episodes under both shared-LSTM and separate-LSTM architectures. The best-performing checkpoints are combined into a compact ensemble of complementary agents. At each decision step, strategies such as voting and best-of-N select the action with the highest one-step simulated reward obtained by evaluating each candidate action in a copy of the environment simulator under the current campaign state. Under a campaign-level offline evaluation on 107 real-world campaigns (c1–c39 for training and c40–c107 for evaluation), the best ensemble strategy achieves a mean episode reward of 17.22, outperforming the real-data baseline (14.48) by 18.9% and slightly exceeding the best single model (SAC with shared LSTM, 17.02). These results provide offline simulation-based evidence that combining diverse policies can improve robustness and performance over single-agent baselines in advertising budget allocation under realistic operational constraints.
Danaei et al. (Mon,) studied this question.