Abstract Background General-domain sentiment models have been found ineffective in distinguishing positivity and negativity in Intensive Care Unit (ICU) clinical notes and domain-specific sentiment models are recommended. Although there are multiple common approaches to sentiment analysis, there has been little work comparing and evaluating the effectiveness of specialized models, largely due to the difficulty of recruiting clinical annotators and of accessing clinical sentiment data. This study has three contributions: (1) MIMIC-III-Ext-Notes-Sentiment: the first public ICU-specific ground- truth dataset labeled by clinicians for investigating clinical sentiment polarity in ICU clinical notes. (2) SentimentICUModel: an effective model for classifying clinical sentiment in ICU narratives on the ground truth. (3) A guiding comparison of the effectiveness of a range of approaches to clinical sentiment classification on the dataset. Methods We recruited five clinicians to annotate notes for the ground truth. Annotators indicated which pieces of note text influenced their labeling. Six clinical-specific models were compared on the ground truth. Results The task of annotation was challenging due to clinicians’ workload and spanned 15 months. The ground truth data was formed based on inter-annotator agreement analysis. Clinicians’ extracts similarity aligned with their agreement level. Clinical language models provide comparable accuracy (up to 82%), with top score achieved by ClinicalT5 which is being released as SentimentICUModel. They outperform keyword-based lexicon (p < 0. 05, 95% CI, −0. 47, −0. 28). Conclusion Clinical language models have demonstrated effectiveness in identifying clinical sentiment within clinical notes, enabling early detection of sudden changes and exploring different patterns in patients’ ICU stays.
Nagoor et al. (Mon,) studied this question.