What question did this study set out to answer?

This research aims to optimize multi-channel advertising budgets using a novel deep reinforcement learning framework.

June 10, 2026Open Access

An Ensemble Deep Reinforcement Learning Framework for Multi-Channel Advertising Budget Optimization: A Practical AI Approach

Key Points

This research aims to optimize multi-channel advertising budgets using a novel deep reinforcement learning framework.
Developed a multi-algorithm ensemble framework using DDPG, TD3, and SAC algorithms.
Utilized Optuna for hyperparameter optimization over 200 trials and trained agents for 1,000 episodes.
Evaluated strategies in simulated environments against 107 real-world campaigns.
Achieved a mean episode reward of 17.22 using the ensemble strategy, an 18.9% improvement over the baseline reward of 14.48.
Slightly exceeded the best single model (SAC with shared LSTM) with a reward of 17.02.
Demonstrated that diverse policy combinations enhance robustness in budget allocation.

Abstract

This paper proposes a multi-algorithm ensemble deep reinforcement learning framework for optimizing multi-channel advertising budgets in dynamic digital environments. Budget allocation across six advertising channels is formulated as a sequential decision problem with simplex-constrained actions. Recurrent actor–critic agents with Dirichlet exploration and renormalization preserve budget feasibility throughout training. Three modern actor–critic algorithms (DDPG, TD3, and SAC) are optimized using Optuna-based hyperparameter search (200 trials per configuration) and trained for 1,000 episodes under both shared-LSTM and separate-LSTM architectures. The best-performing checkpoints are combined into a compact ensemble of complementary agents. At each decision step, strategies such as voting and best-of-N select the action with the highest one-step simulated reward obtained by evaluating each candidate action in a copy of the environment simulator under the current campaign state. Under a campaign-level offline evaluation on 107 real-world campaigns (c1–c39 for training and c40–c107 for evaluation), the best ensemble strategy achieves a mean episode reward of 17.22, outperforming the real-data baseline (14.48) by 18.9% and slightly exceeding the best single model (SAC with shared LSTM, 17.02). These results provide offline simulation-based evidence that combining diverse policies can improve robustness and performance over single-agent baselines in advertising budget allocation under realistic operational constraints.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper