What does this research mean for the field?

Ensemble learning algorithms, particularly XGBoost, provide superior accuracy and predictive performance for consumer purchase intention compared to traditional models, while logistic regression maintains the best interpretability. Novelty: ClaimNovelty.CONFIRMATORY. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This study aims to evaluate the effectiveness of various machine learning algorithms in predicting consumer purchase intentions.

June 4, 2026Open Access

Consumer Behavior Data Mining and Analysis Using Machine Learning Algorithms

Key Points

This study aims to evaluate the effectiveness of various machine learning algorithms in predicting consumer purchase intentions.
Selected four algorithms: logistic regression, support vector machine, random forest, XGboost.
Conducted data preprocessing and feature engineering on a real e-commerce dataset.
Evaluated model performance using accuracy, F1 score, and AUC metrics.
XGboost algorithm yielded the highest accuracy, F1 score, and AUC among tested algorithms.
Random forest provided a balance of stability and efficiency in prediction.
Logistic regression excelled in explicability despite lower prediction accuracy.

Abstract

In the era of digital economy, the vast amount of consumer online behavior data provides unprecedented possibilities for accurate insight into market demand and prediction of individual behavior. This study aims to systematically explore and compare the effectiveness of different machine learning algorithms in consumer behavior data mining and analysis. Focusing on the core task of "prediction of customers’ future purchase intention", the research selects four typical algorithms, including logical regression, support vector machine, random forest and XGboost, and constructs a complete analysis process from data preprocessing, feature engineering to model training evaluation on a real e-commerce data set. This paper systematically reviews the evolution from classical behavior theory to modern data mining technology. In terms of methodology, this paper describes the key steps of experimental conditions, data cleaning, feature construction (including RFM and extended features) and model implementation in detail. The experimental results are presented clearly through the comprehensive performance table, efficiency comparison table and feature importance table. The analysis shows that XGboost algorithm performs best in accuracy, F1 score, AUC and other key indicators, showing a strong ability to deal with complex nonlinear relationships; The Stochastic Forest achieves a good balance in stability and efficiency; However, logistic regression maintains the best explicability. This study not only verifies the superiority of ensemble learning in consumer behavior prediction, but also provides empirical basis and selection guidance for enterprises in the trade-off between accuracy, efficiency and interpretability.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper