Outlier detection is a fundamental component of data preprocessing and quality monitoring across diverse scientific domains, including engineering, biomedical sciences, and finance. While many variables in controlled environments approximate a normal distribution, real-world data, particularly biological, environmental, and epidemiological measures, are frequently characterized by pronounced right-skewness. To address the shortcomings of conventional methods, this study introduces the Dynamic Threshold for Outlier Detection (DTOD), which reframes outlier detection as a concrete operational workflow. The DTOD framework dynamically adjusts detection thresholds based on a functional relationship between skewness and tail morphology. Validation through large-scale simulation experiments across light-, middle-, and high-skewness levels confirms the method’s versatility. The DTOD proves particularly effective at two ends of the spectrum: enhancing sensitivity for detecting subtle anomalies in light-skewed data while serving as a conservative, high-confidence screening tool that controls false positives in high-skewness environments. In real-world application to North American Association of Central Cancer Registries (NAACCR) data, the method successfully identified outliers with abnormally high unknown tumor size rates in colorectal cancer and maintained a low misclassification rate in highly skewed lung cancer data. Ultimately, the DTOD provides a promising, interpretable solution for improving data quality in skewed scenarios.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yang et al. (Mon,) studied this question.
synapsesocial.com/papers/69c37be2b34aaaeb1a67ec94 — DOI: https://doi.org/10.3390/stats9020033
Xiaowen Yang
Amjila Bam
Nubaira Rizvi
Stats
Louisiana State University Health Sciences Center New Orleans
Building similarity graph...
Analyzing shared references across papers
Loading...
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: