Key points are not available for this paper at this time.
The primary objective of outlier detection is to identify values that are significantly different from other data in the dataset. However, most of the current algorithms are effective for small-scale data and their performance is highly dependent on the choice of parameters. To this, the paper proposes an innovative outlier detection algorithm, named FBOD, which is based on data features. Specifically, FBOD utilizes the number of combinations of attribute values in each tuple across the dataset as a key feature of the tuple. This feature reflects the degree of differentiation of each data tuple from the overall dataset. By identifying data features that are significantly different from objects in the entire dataset, FBOD can accurately determine outliers. Based on the knowledge of attribute importance in Rough Set Theory, FBOD can efficiently reduce the data dimensionality, so high-dimensional datasets can be handled by FBOD. Furthermore, this paper employs a distributed parallelism approach in the algorithm. The performance of FBOD is evaluated on several datasets, including standard datasets and artificial datasets, and is compared with common outlier detection algorithms. Experimental results indicate that, compared to existing typical algorithms, FBOD achieves higher outlier detection accuracy without relying on a prior knowledge of datasets or parameter selection. It also proves the suitability of FBOD for large-scale dataset processing on distributed platforms.
Zhao et al. (Sat,) studied this question.