What question did this study set out to answer?

The aim is to develop a calibrated natural-language inference model for verifying clinical claims using the MedNLI benchmark.

June 15, 2026Open Access

Silence Is Safety: A Calibrated Clinical NLI Verifier for MedNLI

Key Points

The aim is to develop a calibrated natural-language inference model for verifying clinical claims using the MedNLI benchmark.
Developed the AetherMind MedNLI Clinical NLI Verifier model for three-way inference.
Evaluated the model's performance on MedNLI test split, achieving specified accuracy and precision metrics.
Applied temperature scaling on the development split to improve calibration error.
Achieved 88.19% accuracy on the MedNLI test split; macro F1 score of 88.21%.
Contradiction recall reached 92.41% and precision was 93.99%.
Reduced expected calibration error from 0.090 to 0.037 using temperature scaling.

Abstract

This preprint presents AetherMind MedNLI Clinical NLI Verifier, a calibrated clinical natural-language inference model for the MedNLI benchmark. The model performs three-way inference over clinical premise-hypothesis pairs: entailment, neutral, and contradiction. On the MedNLI test split, the verifier achieves 88.19% accuracy with 95% confidence interval 86.50 89.87, macro F1 of 88.21%, contradiction recall of 92.41%, and contradiction precision of 93.99%. Temperature scaling fitted on the development split improves expected calibration error from 0.090 to 0.037. The paper also evaluates selective abstention for safer clinical claim verification workflows.

Silence Is Safety: A Calibrated Clinical NLI Verifier for MedNLI

Key Points

Abstract

Cite This Study