Machine learning has been widely used for concrete compressive strength prediction, yet previous studies have focused mainly on algorithm comparison and isolated feature-processing strategies. The coupled influence of dataset characteristics on prediction error has received less systematic attention. This study investigates concrete strength prediction from a data structure perspective by examining three structural variables, namely, sample size, feature size, and compressive strength range. A unified experimental framework was constructed using 15 concrete datasets. Correlation, partial correlation, information entropy, and relief were employed to reorganize feature subsets, and the resulting error trends were evaluated using artificial neural network (ANN), support vector regression (SVR), and random forest (RF) models. The results show that prediction error generally decreases first and then becomes stable as feature size increases, although the location of the low-error region depends on the dataset and the filtering method. Larger sample size is associated with improved prediction stability, whereas wider strength range tends to increase prediction difficulty. Based on these observations, an empirical relationship was established to describe the joint effect of sample size, feature size, and strength range on prediction error. The findings indicate that the attainable error level in concrete strength prediction is controlled not only by model form but also by dataset organization and feature configuration. Within the present framework, the study provides a practical basis for designing feature systems and interpreting model performance across datasets with different structural characteristics.
Mo et al. (Tue,) studied this question.