The increased amount of soccer data challenges performance evaluation using classical methods. Therefore, recent machine learning approaches can help address complex datasets in sports. The present study aims to determine which performance variables most strongly contribute to game outcomes in elite male soccer by using machine learning models trained under different venue conditions. Technical, tactical, and physical variables obtained from 542 matches played over two consecutive seasons were used to predict results for three venue conditions (all, home, and away). Variance Inflation Factor analysis and BorutaPy were applied before extreme gradient boost (XGBoost) modeling. Feature importance rankings and SHAP analysis were used to identify variables affecting model performance and outputs across different conditions. The models showed high accuracy on game outcome predictions, especially in the win and loss conditions (between 93.38-95.93%), while lower results in the draw (between 68.99-88.46%). The variables that most impacted the model's predictions were the Conversion rate, the Opponent's xG per goal, the Opponent's xG, and xG Conversion. The teams’ performance predictions for game outcomes differ, and draws are difficult to predict in this study's competition. The technical variables contributed the most to the models and outputs. Coaches should consider the structure and needs of their competitions while evaluating the data they possess. Future research could develop models for different tournaments, especially using time-related variables, if applicable.
Karakoç et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: