Los puntos clave no están disponibles para este artículo en este momento.
Pharmacovigilance requires automated systems to extract biomedical entities and their relationships from text, as manual processes are inefficient and prone to error. This study develops a reproducible pipeline for Named Entity Recognition (NER) and pattern-based proxy relation formation, focusing on drug side effects related to breast cancer. The research contribution is twofold: a domain-specific annotated dataset for pharmacovigilance NER, and a reproducible pipeline for proxy-based relation analysis. The experimental setup combines MobileBERT, DistilBERT, TinyBERT, and ALBERT. Evaluation is conducted using accuracy, precision, recall, F1-score, ROC AUC, and computational efficiency metrics. The results show that ALBERT achieves the highest NER performance (F1-score = 0.9261), while DistilBERT attains the best ROC AUC (0.9037). TinyBERT is the most efficient model, with 4.57 million parameters, 4.68 G FLOPs, and an average training time of 45.8 seconds per scenario. The proposed pipeline demonstrates a trade-off between accuracy and computational efficiency under the evaluated setting. The generated relations act as sentence-level proxy indicators of potential drug–adverse event associations and serve as a preliminary triage layer requiring expert validation rather than a high-precision system. However, the approach does not account for negation, uncertainty, or cross-sentence context, which may introduce false positive associations. Despite these limitations, the pipeline provides a reproducible baseline for exploratory pharmacovigilance analysis.
Triwibowo et al. (Wed,) studied this question.