The highly sparse and dynamic nature of e-commerce transactions presents a significant challenge for accurately predicting customer intent and composition of future purchases. While traditional Market Basket Analysis (MBA) and collaborative filtering models provide foundational insights, they often fail to integrate preference history, semantic and propensity features that could provide useful insights into user purchase behaviour. To address this, this study proposes KALEFormer (KMeans And LDA Enriched Transformer for Market Basket Analysis), an integrated framework that combines machine learning, natural language processing and deep learning to enhance predictions. Experiments are conducted on the UCI Online Retail dataset. Machine learning techniques are first used for data preprocessing, dimensionality reduction via PCA and customer segmentation via KMeans clustering. Latent thematic information is then captured by topic modelling on item descriptions via LDA and semantic item representations are derived using MiniLM sentence embeddings. These enriched signals are subsequently fused using an encoder-only transformer model to learn basket-level purchasing patterns. The objective is to enhance recommendation by jointly modelling customer tendencies and semantic context rather than relying solely on item co-occurrence. Evaluation results compared to state of the art models demonstrate that KALEFormer performs effectively even in a sparse recommendation, achieving the highest metrics: HR@k of 80.69% & 90.54% and NDCG@k of 38.01% & 35.16% for k = 10 and 50 respectively. Additional train–test analysis indicates stable generalization and ablation test confirms that the inclusion of contextual metadata features contributes to significant performance improvements.
Surana et al. (Wed,) studied this question.