Heat treatment critically controls microstructure and mechanical properties in engineering alloys, but experimental optimization is costly and time-intensive. Machine learning (ML) offers a data-driven alternative, though data scarcity and feature leakage often limit predictive reliability. A comprehensive ML framework was developed and validated using a physics-informed synthetic dataset of 332 heat-treated alloy samples covering carbon steels (AISI 4140, 1080, 4340, 5130), aluminum alloys (AlSi7Mg, AlSi10Mg, Al6061, Al2618), and stainless steels (304, 316L). Twenty-seven features describing chemical composition, heat-treatment parameters, and microstructural characteristics were initially included. Following strict data-leakage analysis, all six mechanical property features were fully removed, leaving 22 independent predictors. Five regression models—Extra Trees, Random Forest, Gradient Boosting, Ridge, and ElasticNet—were evaluated using a 70/15/15 train–validation–test split with randomized hyperparameter optimization and 3-fold cross-validation. The Random Forest model showed the best test performance for tensile strength prediction (R2 = 0.9282, RMSE = 37.24 MPa, MAE = 28.54 MPa, MAPE = 5.39%), with minimal overfitting. Tempering temperature, carbon content, and manganese content were the most influential features, aligning with established metallurgical principles. The proposed framework demonstrates robust, leakage-free prediction of mechanical properties from composition and processing parameters, offering a scalable approach for accelerated alloy design pending experimental validation. This study serves as a methodological framework demonstration; the reported performance metrics are benchmarks against the synthetic dataset, and experimental validation with real alloy data remains essential for industrial deployment.
Tiwari et al. (Fri,) studied this question.