What question did this study set out to answer?

The aim is to develop an adaptive CAPTCHA defense framework utilizing reinforcement learning for effective bot detection in web applications.

June 3, 2026Open Access

Designing CAPTCHA Systems with Reinforcement Learning for Adaptive Defense

Key Points

The aim is to develop an adaptive CAPTCHA defense framework utilizing reinforcement learning for effective bot detection in web applications.
Designed a reinforcement learning CAPTCHA system based on a partially observable Markov decision process.
Employed a Proximal Policy Optimization agent analyzing mouse movements and clicks for bot detection.
Utilized an XGBoost classifier to provide a benchmark human-likelihood score across web application sessions.
Soft PPO achieved 97.7% accuracy, 100% precision, and a 97.6% F1 score in bot detection.
The XGBoost classifier reached 99.48% accuracy and a 1.000 ROC-AUC score.
The adaptive framework showed transparency and auditability compared to traditional risk-scoring systems.

Abstract

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) systems remain a widely deployed defense against automated abuse, but advances in machine learning have reduced the effectiveness of traditional challenge-based designs and exposed limitations in proprietary risk-scoring systems. This paper presents an adaptive, reinforcement learning-based CAPTCHA defense framework for high-security web applications. The proposed system formulates bot detection as a partially observable Markov decision process and uses a Proximal Policy Optimization (PPO) agent with Long Short-Term Memory to analyze streamed behavioral telemetry, including mouse movements, clicks, keystrokes, and scrolling, over sequential interaction windows. During the observation phase, the agent can continue observing or deploy a honeypot as an early-intervention and evidence-gathering action; after sufficient session evidence is accumulated, it can issue graded CAPTCHA challenges, allow a session, or block it. To complement the sequential agent, the framework also includes an XGBoost classifier that produces a session-level human-likelihood score as a supervised benchmark. The accompanying reinforcement learning environment and code base are publicly available, allowing future researchers to train, evaluate, and extend adaptive CAPTCHA policies as bot capabilities evolve. Experiments conducted on a sandbox ticket-purchasing web application demonstrate that the proposed methodology achieves strong preliminary performance on human-generated sessions and real bot sessions produced by scripted, replay-based, and Large Language Model (LLM)-powered agents. Among the evaluated reinforcement learning algorithm variants, Soft PPO achieved the best performance with 97.7% accuracy, 100% precision, and a 97.6% F1 score. Correspondingly, the XGBoost classifier achieved 99.48% accuracy, a 1.000 ROC-AUC (receiver operating characteristic area under the curve), and a 0.9919 F1 score. Our results indicate that sequential reinforcement learning can support accurate and low-friction bot detection, while the accompanying classifier provides a complementary binary benchmark. Compared to proprietary systems, the proposed framework emphasizes transparency, auditability, and explicit sequential decision-making rather than black-box risk scoring. Overall, this work introduces a publicly available, open, and adaptive CAPTCHA defense framework that supports transparent experimentation with behavior-based bot mitigation while also identifying the remaining limits that must be addressed before commercial deployment.

Designing CAPTCHA Systems with Reinforcement Learning for Adaptive Defense

Key Points

Abstract

Cite This Study