What question did this study set out to answer?

This research aims to enhance data quality in AI-driven communication systems by effectively addressing missing values using advanced preprocessing techniques.

June 19, 2026Open Access

Improving data quality using preprocessing in various missingness mechanisms for AI-enabled communication systems

Key Points

This research aims to enhance data quality in AI-driven communication systems by effectively addressing missing values using advanced preprocessing techniques.
Developed a novel imputation technique based on linear regression and clustering methods.
Evaluated algorithm performance on four datasets with various missing value ratios under MNAR, MAR, and MCAR mechanisms.
Compared results with existing imputation methods using MAE, RMSE, and R² score.
Achieved lower MAE and RMSE compared to current imputation methods.
Demonstrated higher R² scores, indicating improved accuracy in data imputation.
Required less computational time, enhancing practical utility in real-time applications.

Abstract

Preprocessing data is an essential task in real-world data analysis, particularly for AI-driven applications in modern communication systems, where data quality directly impacts learning efficiency and decision-making accuracy. The existence of missing values (MVs) is a common issue when dealing with datasets collected from communication networks and distributed intelligent environments. Therefore, missing values in a dataset should be handled using appropriate imputation methods to improve the performance and accuracy of data mining and artificial intelligence models. Missing values must be treated carefully during the preprocessing stage to ensure reliable and trustworthy AI-based communication services. To this end, this paper proposes a novel technique aimed at obtaining high-quality data by effectively handling missing values in the dataset under consideration. The proposed algorithm primarily relies on linear regression and further benefits from clustering techniques to group closely related instances, which enhances the precision of the imputation process. The performance of the proposed imputation method is evaluated using four datasets with varying sizes and missing value ratios, generated under three different missingness mechanisms: Missing Not at Random (MNAR), Missing at Random (MAR), and Missing Completely at Random (MCAR). The proposed method is compared with existing imputation techniques in terms of mean absolute error (MAE), root-mean-square error (RMSE), and the coefficient of determination ( R 2 score). Experimental results demonstrate that the proposed method requires less computational time while achieving higher accuracy, making it suitable for data preprocessing in AI-enabled communication and intelligent network environments.

Bookmark

View Full Paper