November 26, 2025Open Access

Out-of-distribution detection in text using statistical techniques

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract Out-of-distribution data points diverge from the general profile of the data, typically defined by the specific task for which the machine learning model is being constructed. Machine learning models are more reliable when out-of-distribution detection is part of the pipeline. Out-of-domain detection models are employed not just to sieve input into a model, but also to scrutinise output from a generative model, a process known as selective generation 1. In literature, Mahalanobis distance is widely used in anomaly detection. In this work, we leverage the relation of Mahalanobis distance to Hotelling’s T-squared and Chi-squared distribution, which is further adapted for the inference on out-of-domain detection task. Data is usually categorised into three types: a) in-domain, b) out-of-domain, and c) background data. We explore approaches: a) constructed solely with in-domain data, and b) constructed using both in-domain and background data. Our proposed approaches are background free and efficient, and shows promising results compared to existing work in the literature which employ background data. We show that Hotelling’s T-square approach improves upon the Chi-square approach.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper