This paper evaluates the performance of seven machine learning (ML) algorithms for anomaly detection using the Numenta Anomaly Benchmark (NAB) dataset. The algorithms examined include Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machines (SVM), Neural Networks (NN), K-Nearest Neighbors (KNN) and Naive Bayes (NB). Two distinct experimental setups were conducted, one evaluating models without additional features and another incorporating created features such as lagged values, rolling window statics, difference values and time based features like hour, day of the year and weekend. The models were trained using the NAB dataset, and their effectiveness in detecting anomalies was assessed. Performance was rigorously evaluated using standard classification metrics like Precision, Recall and F1-Score. In the experiment conducted without additional features, the NN model demonstrated the highest overall performance with an F1-Score of 0.0626526, Precision of 0.0542125 and Recall of 0.0961538 predicting anomalies in 9 files. LR achieved the highest Recall of 0.192029 but with a low Precision of 0.0226541, indicating it often predicted anomalies in a large number of files (38 files) at the cost of high false positives. KNN consistently failed to detect any anomalies across both experiments. The incorporation of additional features generally led to a degradation in performance across most models. For instance, the NN F1-Score decreased to 0.0377358 with features, suggesting that the added features did not enhance and in some cases hindered the models’ anomaly detection capabilities. Some models like LR and SVM also showed an increase in files with errors when features were included. The analysis indicates that while some models are effective at recalling anomalies, they tend to classify a significant amount of normal data as anomaly (low precision). The study highlights the critical impact of feature engineering on anomaly detection performance.
Abel Desalegn Demeke (Tue,) studied this question.