The analysis of medical data presents an opportunity for healthcare systems to support decision-making and improve patient outcomes. In this context, the automated analysis of user-generated drug reviews offers a promising approach for monitoring medication safety, understanding patient experiences, and detecting potential adverse effects in real time. This study advances sentiment analyses for pharmacovigilance by introducing a data-centric framework that incorporates a GenAI-powered labeling system for reliable and interpretable data annotation. A corpus of 213,869 user-generated drug reviews was processed through a hybrid labeling pipeline that reconciles user ratings, lexicon-based polarity, zero-shot transformer predictions, and GPT-5.2 as a fallback mechanism. This strategy enables the resolution of sentiment ambiguity, particularly the frequent misalignment between user-assigned ratings and underlying textual sentiment, by leveraging contextual understanding rather than relying solely on numerical scores. Drug review representations are enhanced using the Qwen3-Embedding-0.6B model, allowing improved capture of semantic nuances. Evaluated through 10-fold stratified cross-validation, the proposed labeling framework combined with a Random Forest classifier achieves a classification accuracy of 96.45%, with per-class analysis confirming consistent performance across all sentiment categories. Cross-source validation on an independent drug review dataset of 4091 reviews and a threshold sensitivity analysis further support the robustness and generalizability of the proposed approach.
Vouzis et al. (Sat,) studied this question.