Abstract — The effectiveness of Outlier Detection (OD) is highly sensitive to the data’s inherent properties, specifically its dimensionality (one-dimensional versus multidimensional) and statistical distribution (normal versus non-normal). This research addresses the critical need for systematic technique selection by presenting a comparative analysis of OD algorithms across these four predefined data scenarios. Techniques investigated range from classical statistical methods, such as the Z-score and Mahalanobis Distance, to advanced ensemble and density-based models like Isolation Forest (iForest) and Local Outlier Factor (LOF). The study rigorously evaluates the precision, recall, and computational efficiency of these methods using diverse datasets. The primary contribution is an evidence-based framework that provides clear, structured guidance for practitioners to select the optimal OD strategy, thereby significantly enhancing the robustness and integrity of data preprocessing pipelines.
Shah et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: