What question did this study set out to answer?

This study aims to evaluate the effectiveness of various sentiment analysis models on ICU clinical notes.

May 20, 2026Open Access

Clinical sentiment analysis of ICU notes: a dataset and a comparison of lexicon and clinical language model methods

Key Points

This study aims to evaluate the effectiveness of various sentiment analysis models on ICU clinical notes.
Five clinicians annotated notes to create a ground truth dataset over 15 months.
Six clinical-specific sentiment models were compared against the ground truth dataset.
Inter-annotator agreement analysis was used to assess the annotation quality.
ClinicalT5 model achieved the highest accuracy of 82% in identifying sentiment.
Clinical language models significantly outperformed keyword-based lexicon methods (p < 0.05, 95% CI, −0.47, −0.28).
Clinical sentiment analysis enables early detection of changes in patient conditions.

Abstract

Abstract Background General-domain sentiment models have been found ineffective in distinguishing positivity and negativity in Intensive Care Unit (ICU) clinical notes and domain-specific sentiment models are recommended. Although there are multiple common approaches to sentiment analysis, there has been little work comparing and evaluating the effectiveness of specialized models, largely due to the difficulty of recruiting clinical annotators and of accessing clinical sentiment data. This study has three contributions: (1) MIMIC-III-Ext-Notes-Sentiment: the first public ICU-specific ground- truth dataset labeled by clinicians for investigating clinical sentiment polarity in ICU clinical notes. (2) SentimentICUModel: an effective model for classifying clinical sentiment in ICU narratives on the ground truth. (3) A guiding comparison of the effectiveness of a range of approaches to clinical sentiment classification on the dataset. Methods We recruited five clinicians to annotate notes for the ground truth. Annotators indicated which pieces of note text influenced their labeling. Six clinical-specific models were compared on the ground truth. Results The task of annotation was challenging due to clinicians’ workload and spanned 15 months. The ground truth data was formed based on inter-annotator agreement analysis. Clinicians’ extracts similarity aligned with their agreement level. Clinical language models provide comparable accuracy (up to 82%), with top score achieved by ClinicalT5 which is being released as SentimentICUModel. They outperform keyword-based lexicon (p < 0. 05, 95% CI, −0. 47, −0. 28). Conclusion Clinical language models have demonstrated effectiveness in identifying clinical sentiment within clinical notes, enabling early detection of sudden changes and exploring different patterns in patients’ ICU stays.

Bookmark

View Full Paper

Bookmark

View Full Paper

Clinical sentiment analysis of ICU notes: a dataset and a comparison of lexicon and clinical language model methods

Key Points

Abstract

Cite This Study