Abstract Detecting click fraud is a critical challenge in digital advertising, particularly due to the lack of reliable labeled ground truth that differentiates genuine and fraudulent publishers in user-click datasets. Existing feature sets often fall short in capturing subtle behavioral changes and disguise strategies adopted by fraudulent actors. To address this, we propose sixteen novel composite features created by statistically aggregating key clickstream attributes such as mean, variance, skewness, and standard deviation over fine-grained temporal intervals to better represent evolving behavioral patterns. The proposed features were evaluated on the FDMA2012 dataset under three experimental setups: (i) baseline features only, (ii) combined feature set with newly designed features, and (iii) feature relevance ranking using the Kruskal Wallis test. Using tenfold cross-validation, the enhanced 119-feature configuration demonstrated substantial gains in fraud detection performance. The best-performing classifier achieved an average precision of 86.12%, recall of 89.61%, and F1-score of 90.14%, clearly outperforming the baseline feature set. These improvements confirm that the newly engineered features significantly strengthen the classifier’s capability in distinguishing fraudulent publishers from legitimate ones, contributing to more reliable and proactive fraud mitigation in real-world advertising environments.
Singh et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: