What question did this study set out to answer?

This research examines how dataset characteristics, including sample size and feature size, affect prediction error in concrete compressive strength models.

April 16, 2026Open Access

Influence of Data Structure on Prediction Error in Machine Learning-Based Concrete Compressive Strength Models

Key Points

This research examines how dataset characteristics, including sample size and feature size, affect prediction error in concrete compressive strength models.
Analyzed 15 concrete datasets to explore dataset characteristics.
Utilized correlation, information entropy, and relief for feature selection.
Evaluated prediction errors using ANN, SVR, and RF models.
Prediction error decreases initially with increasing feature size, then stabilizes.
Larger sample sizes improve prediction stability.
Wider strength ranges complicate prediction accuracy.

Abstract

Machine learning has been widely used for concrete compressive strength prediction, yet previous studies have focused mainly on algorithm comparison and isolated feature-processing strategies. The coupled influence of dataset characteristics on prediction error has received less systematic attention. This study investigates concrete strength prediction from a data structure perspective by examining three structural variables, namely, sample size, feature size, and compressive strength range. A unified experimental framework was constructed using 15 concrete datasets. Correlation, partial correlation, information entropy, and relief were employed to reorganize feature subsets, and the resulting error trends were evaluated using artificial neural network (ANN), support vector regression (SVR), and random forest (RF) models. The results show that prediction error generally decreases first and then becomes stable as feature size increases, although the location of the low-error region depends on the dataset and the filtering method. Larger sample size is associated with improved prediction stability, whereas wider strength range tends to increase prediction difficulty. Based on these observations, an empirical relationship was established to describe the joint effect of sample size, feature size, and strength range on prediction error. The findings indicate that the attainable error level in concrete strength prediction is controlled not only by model form but also by dataset organization and feature configuration. Within the present framework, the study provides a practical basis for designing feature systems and interpreting model performance across datasets with different structural characteristics.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Mo et al. (Tue,) studied this question.

synapsesocial.com/papers/69e07cc02f7e8953b7cbddeb https://doi.org/https://doi.org/10.3390/buildings16081537

Bookmark

View Full Paper