Coffee is a major global commodity, with specialty coffees valued for their quality, assessed through standardized sensory protocols. The SCA (Specialty Coffee Association) score is a key indicator of commercial value, but sensory evaluation is resource-intensive and subject to variability. This study developed predictive models to estimate SCA scores from processing and production-related variables collected between 2019 and 2023, covering reception, fermentation, pulping, washing, drying, storage, and contextual production information. Random Forest (RF) and XGBoost (XGB) regression algorithms were applied using three approaches: complete variable set, Principal Component Analysis (PCA), and selection of the seven most relevant variables. The RF model with all variables achieved the best performance (MAE = 0.80; RMSE = 1.03; R2 = 0.53). However, models using only seven predictors achieved nearly equivalent results (MAE = 0.81; RMSE = 1.06; R2 = 0.50), with RF and XGB showing RMSE around 1.05 and R2 above 0.50. PCA-based models performed worse. In conclusion, variable selection proved more efficient and robust than PCA, enabling moderate but practically relevant prediction of SCA scores with reduced model complexity in specialty coffee production. PRACTICAL APPLICATIONS: This research shows that machine learning models can help predict coffee quality scores using processing data. Such tools may support producers and cooperatives in monitoring quality earlier and more efficiently, reducing reliance on extensive sensory tests and improving decision-making in specialty coffee production.
Ferraz et al. (Sun,) studied this question.