This preprint presents AetherMind MedNLI Clinical NLI Verifier, a calibrated clinical natural-language inference model for the MedNLI benchmark. The model performs three-way inference over clinical premise-hypothesis pairs: entailment, neutral, and contradiction. On the MedNLI test split, the verifier achieves 88.19% accuracy with 95% confidence interval 86.50 89.87, macro F1 of 88.21%, contradiction recall of 92.41%, and contradiction precision of 93.99%. Temperature scaling fitted on the development split improves expected calibration error from 0.090 to 0.037. The paper also evaluates selective abstention for safer clinical claim verification workflows.
Sameer Najm (Sat,) studied this question.