What question did this study set out to answer?

The aim is to improve click fraud detection by developing new composite features based on publisher behavior.

February 26, 2026Open Access

Modeling publisher behavior through conditional feature patterns for fraudulent activity detection

Key Points

The aim is to improve click fraud detection by developing new composite features based on publisher behavior.
Developed sixteen novel composite features from clickstream data
Evaluated features on the FDMA2012 dataset
Conducted experiments comparing baseline and enhanced feature sets
Used tenfold cross-validation and Kruskal Wallis test for feature relevance
Enhanced feature set improved average precision to 86.12%
Recall reached 89.61%, with an F1-score of 90.14%
Newly engineered features significantly outperformed baseline configurations

Abstract

Abstract Detecting click fraud is a critical challenge in digital advertising, particularly due to the lack of reliable labeled ground truth that differentiates genuine and fraudulent publishers in user-click datasets. Existing feature sets often fall short in capturing subtle behavioral changes and disguise strategies adopted by fraudulent actors. To address this, we propose sixteen novel composite features created by statistically aggregating key clickstream attributes such as mean, variance, skewness, and standard deviation over fine-grained temporal intervals to better represent evolving behavioral patterns. The proposed features were evaluated on the FDMA2012 dataset under three experimental setups: (i) baseline features only, (ii) combined feature set with newly designed features, and (iii) feature relevance ranking using the Kruskal Wallis test. Using tenfold cross-validation, the enhanced 119-feature configuration demonstrated substantial gains in fraud detection performance. The best-performing classifier achieved an average precision of 86.12%, recall of 89.61%, and F1-score of 90.14%, clearly outperforming the baseline feature set. These improvements confirm that the newly engineered features significantly strengthen the classifier’s capability in distinguishing fraudulent publishers from legitimate ones, contributing to more reliable and proactive fraud mitigation in real-world advertising environments.

Bookmark

View Full Paper