Outlier detection is a fundamental component of data preprocessing and quality monitoring across diverse scientific domains, including engineering, biomedical sciences, and finance. While many variables in controlled environments approximate a normal distribution, real-world data, particularly biological, environmental, and epidemiological measures, are frequently characterized by pronounced right-skewness. To address the shortcomings of conventional methods, this study introduces the Dynamic Threshold for Outlier Detection (DTOD), which reframes outlier detection as a concrete operational workflow. The DTOD framework dynamically adjusts detection thresholds based on a functional relationship between skewness and tail morphology. Validation through large-scale simulation experiments across light-, middle-, and high-skewness levels confirms the method’s versatility. The DTOD proves particularly effective at two ends of the spectrum: enhancing sensitivity for detecting subtle anomalies in light-skewed data while serving as a conservative, high-confidence screening tool that controls false positives in high-skewness environments. In real-world application to North American Association of Central Cancer Registries (NAACCR) data, the method successfully identified outliers with abnormally high unknown tumor size rates in colorectal cancer and maintained a low misclassification rate in highly skewed lung cancer data. Ultimately, the DTOD provides a promising, interpretable solution for improving data quality in skewed scenarios.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiaowen Yang
Amjila Bam
Nubaira Rizvi
Stats
Louisiana State University Health Sciences Center New Orleans
Building similarity graph...
Analyzing shared references across papers
Loading...
Yang et al. (Mon,) studied this question.
synapsesocial.com/papers/69c37be2b34aaaeb1a67ec94 — DOI: https://doi.org/10.3390/stats9020033
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: