Identification of data distribution and data normalization are fundamental steps in data pre-processing and statistical modeling. Probability distribution identification is essential for selecting an appropriate statistical model, whereas normalization transforms data into a comparable scale to improve the performance of machine learning algorithms. Although normalization techniques are extensively used in data pre-processing, they are generally applied without examining their impact on the underlying probability distribution of the data. This study systematically investigates the influence of normalization methods on probability distribution behavior and distributional transformation. In this work, datasets derived from four popular probability distributions—Gaussian, Exponential, Weibull, and Lognormal—are subjected to fourteen different normalizing approaches. Following normalization, statistical goodness-of-fit metrics and estimated distributional parameters are used to refit the converted observations into nine potential probability distributions. The suggested methodology offers a comparative examination of how different normalization techniques affect the data’s probabilistic properties, parameter estimation behavior, and distributional structure. The study further identifies normalization-specific distributional transition behavior and validates the results using independent univariate subsets of a publicly available real-world dataset. The experimental results show that normalization can significantly alter the original probability distribution: ED, WD, and LD datasets show significant distributional changes under several normalization methods, while GD-distributed data largely preserve their normality under TH, HT, LS, and PNN normalization techniques. While GD data mainly maintain their normality under TH, HT, LS, and PNN normalization approaches, ED, WD, and LD datasets show significant distributional shifts with various normalization procedures. The work uses separate univariate subsets of a publicly accessible real-world dataset to validate the results and further finds distributional transition behavior particular to normalization. This work’s primary contribution is the development of a methodical distribution-aware normalization analysis methodology that links probabilistic modeling behavior with data pre-processing. The results offer useful information for choosing appropriate normalizing methods in applications related to machine learning, statistical inference, risk analysis, Monte-Carlo simulation, and predictive modeling, which enhances the interpretability and dependability of the models.
Mand et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: