Exploring High-Dimensional Outlier Detection: A Comprehensive Study on Methods and Applications Using PCA and k-NN Algorithm

Key Points

Key points are not available for this paper at this time.

Abstract

This study addresses the challenge of outlier detection in high-dimensional data, emphasizing its significance in various applications. The research involves a comprehensive literature review on outlier detection methods, with a specific focus on Angle-based Outlier Detection (ABOD), One-class Support Vector Machine (OSVM), Local Outlier Factor (LOF), Isolation Forest (IF), and AutoEncoder (AE). The high dimensionality problem is discussed, and the use of Principal Component Analysis (PCA) for dimensionality reduction is highlighted. In the experimental phase, a dataset comprising standardized features is scaled for further analysis, excluding certain identifiers. PCA is then applied to reduce the dimensionality, revealing that 90% of the variance can be retained with 12 principal components. Subsequently, outlier detection is performed using the k-Nearest Neighbors (k-NN) algorithm, with an emphasis on the impact of different k values and distance metrics. The analysis extends to a dataset of 20,058 chess matches, incorporating descriptive statistics and exploratory data analysis. PCA is again employed for dimensionality reduction, and outlier detection using k-NN is executed. The findings provide insights into the dataset's characteristics, including key statistical metrics and potential outliers.

Mark Helpful

Bookmark

Relay