What question did this study set out to answer?

The aim is to develop a systematic approach for selecting outlier detection methods based on data characteristics.

March 28, 2026Open Access

An Empirical Framework for Outlier Detection Based on Data Distribution and Dimensionality

Puntos clave

The aim is to develop a systematic approach for selecting outlier detection methods based on data characteristics.
Comparative analysis of four data scenarios: one-dimensional normal, one-dimensional non-normal, multidimensional normal, and multidimensional non-normal.
Evaluation of several outlier detection algorithms including Z-score, Mahalanobis Distance, Isolation Forest, and Local Outlier Factor.
Assessment of algorithms based on precision, recall, and computational efficiency using diverse datasets.
The framework identifies optimal outlier detection techniques based on specific data properties.
Classical methods showed varying effectiveness compared to ensemble and density-based models.
Clear guidelines enhance the robustness of data preprocessing pipelines.

Resumen

Abstract — The effectiveness of Outlier Detection (OD) is highly sensitive to the data’s inherent properties, specifically its dimensionality (one-dimensional versus multidimensional) and statistical distribution (normal versus non-normal). This research addresses the critical need for systematic technique selection by presenting a comparative analysis of OD algorithms across these four predefined data scenarios. Techniques investigated range from classical statistical methods, such as the Z-score and Mahalanobis Distance, to advanced ensemble and density-based models like Isolation Forest (iForest) and Local Outlier Factor (LOF). The study rigorously evaluates the precision, recall, and computational efficiency of these methods using diverse datasets. The primary contribution is an evidence-based framework that provides clear, structured guidance for practitioners to select the optimal OD strategy, thereby significantly enhancing the robustness and integrity of data preprocessing pipelines.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo