The rapid evolution of programmatic advertising has introduced complex challenges in real-time bidding (RTB), where advertisers must make instantaneous and near-optimal bidding decisions in dynamic, partially observable environments. Traditional rule-based and statistical approaches struggle to adapt to fluctuations in user behaviour, market competition and platform dynamics, which frequently results in inefficient budget utilization and suboptimal campaign performance. This paper presents AdOptima RL, an intelligent advertisement optimization framework that leverages deep reinforcement learning (RL) to dynamically learn bidding strategies. The proposed system models the RTB problem as a Markov Decision Process (MDP) and integrates three RL algorithms -Deep Q-Network (DQN), Proximal Policy Optimization (PPO) and an Actor-Critic methodthat collectively cover both discrete and continuous bidding action spaces. A correlation-based feature-weighting mechanism enriches the state representation by emphasising attributes that are statistically predictive of bidding outcomes, accelerating learning and improving decision quality. A per-website agent specialization strategy further allows the framework to capture domain-specific dynamics across different advertising platforms, avoiding the generalization limits of a single monolithic model. In addition to bid optimization, the framework includes a budget allocation module that simulates campaign performance using trained agents and produces data-driven recommendations for distributing advertising spend across platforms. Experimental evaluation on a real-world-inspired auction dataset demonstrates that AdOptima RL improves click-through rate and budget utilization efficiency relative to traditional methods, with PPO performing best in continuous action spaces, DQN delivering low-latency decisions in discrete spaces, and Actor-Critic offering stable convergence on long horizons. The findings highlight the potential of reinforcement learning to transform digital advertising into an adaptive, intelligent and performance-driven decision process.
Kumar et al. (Thu,) studied this question.