Abstract Demographic age range prediction from voice assistant data is crucial for developing personalized user applications. In this context, this study aims to introduce the Adaptive Extremely Random Trees via Reinforcement Learning (AERT-RL) model, a new method that uses adaptive hyperparameter optimization to improve prediction performance. In this paper, categorical verbal data from voice assistants was first converted to numerical representations to facilitate processing. Borderline-Synthetic Minority Oversampling Technique was then used to address class imbalance. Also, mutual information (MI) was then used to identify effective features. Seven ensemble models were run during the classification phase, and the results were compared. Ultimately, the proposed new AERT-RL model outperformed all classifiers with ~90% accuracy, with Kappa (0.86) and F-score (0.89). The results of the study demonstrate that reinforcement learning overcomes the limitations of traditional optimization techniques, enabling adaptive and robust parameter optimization of the models. These results also demonstrate that the integration of the three stages—data quantization, MI-based feature extraction, and the developed AERT-RL model—achieves more effective performance outcomes. Briefly, this research presents an efficient computational model for demographic age range detection from categorical voice data. Furthermore, the proposed AERT-RL model will be a powerful alternative to traditional natural language processing methods.
Yücelbaş et al. (Tue,) studied this question.