Abstract Predicting the friction behavior of diamond-like carbon (DLC) coatings remains a key challenge in tribology due to the complex interplay of test conditions, material properties, and experimental variability. Although literature data are abundant, they are often non-standardized and are reported under highly variable conditions, which hinders their systematic reuse for predictive modeling. This study introduces a machine learning (ML) framework that exploits heterogeneous data with a focus on physical relevance and robustness. A dataset of approximately 4100 points (including 410 friction coefficient points) was compiled from an extensive literature review. Two modeling scenarios are defined: the first uses mechanical, structural, and tribological descriptors; the second adds chemical composition features, offering more detail but reducing dataset size. Six machine learning models are evaluated under standardized training conditions to predict friction. Model performance is evaluated using standard metrics. Extra Trees (ET) and Artificial Neural Networks (ANNs) achieve the highest performance. SHAP (SHapley Additive exPlanations) analysis identifies temperature and hertz pressure as dominant predictors, consistent with the tribological observations. Incorporating chemical composition improved prediction accuracy but reduced dataset size, highlighting a key trade-off between data completeness and feature richness. SHAP analysis shows that while temperature and hertz pressure remain key predictors, the importance of humidity increases, reflecting that chemical inputs enhance not only accuracy but also the physical interpretability of the models. The results demonstrate that literature-based data can support robust and physically meaningful friction modeling when feature richness is balanced with careful control of data quality.
Cherguy et al. (Mon,) studied this question.