CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) systems remain a widely deployed defense against automated abuse, but advances in machine learning have reduced the effectiveness of traditional challenge-based designs and exposed limitations in proprietary risk-scoring systems. This paper presents an adaptive, reinforcement learning-based CAPTCHA defense framework for high-security web applications. The proposed system formulates bot detection as a partially observable Markov decision process and uses a Proximal Policy Optimization (PPO) agent with Long Short-Term Memory to analyze streamed behavioral telemetry, including mouse movements, clicks, keystrokes, and scrolling, over sequential interaction windows. During the observation phase, the agent can continue observing or deploy a honeypot as an early-intervention and evidence-gathering action; after sufficient session evidence is accumulated, it can issue graded CAPTCHA challenges, allow a session, or block it. To complement the sequential agent, the framework also includes an XGBoost classifier that produces a session-level human-likelihood score as a supervised benchmark. The accompanying reinforcement learning environment and code base are publicly available, allowing future researchers to train, evaluate, and extend adaptive CAPTCHA policies as bot capabilities evolve. Experiments conducted on a sandbox ticket-purchasing web application demonstrate that the proposed methodology achieves strong preliminary performance on human-generated sessions and real bot sessions produced by scripted, replay-based, and Large Language Model (LLM)-powered agents. Among the evaluated reinforcement learning algorithm variants, Soft PPO achieved the best performance with 97.7% accuracy, 100% precision, and a 97.6% F1 score. Correspondingly, the XGBoost classifier achieved 99.48% accuracy, a 1.000 ROC-AUC (receiver operating characteristic area under the curve), and a 0.9919 F1 score. Our results indicate that sequential reinforcement learning can support accurate and low-friction bot detection, while the accompanying classifier provides a complementary binary benchmark. Compared to proprietary systems, the proposed framework emphasizes transparency, auditability, and explicit sequential decision-making rather than black-box risk scoring. Overall, this work introduces a publicly available, open, and adaptive CAPTCHA defense framework that supports transparent experimentation with behavior-based bot mitigation while also identifying the remaining limits that must be addressed before commercial deployment.
Indukuri et al. (Sat,) studied this question.