What question did this study set out to answer?

This study aims to enhance anomaly detection in AI systems by addressing issues arising from distributional drift.

May 27, 2026Open Access

When Models Fail: Trustworthy Anomaly Detection Under Distributional Drift via Dual-Layer Monitoring of Data and AI Behaviour

Key Points

This study aims to enhance anomaly detection in AI systems by addressing issues arising from distributional drift.
Developed MARLIN-AD, a dual-layer framework for monitoring data and model anomalies.
Evaluated using synthetic data generation to simulate distributional drift across scenarios.
Applied statistical validation methods, including permutation-based inference and Bayesian bootstrap analysis.
Demonstrated significant degradation of model performance under drift conditions.
Statistical tests confirmed that drifted models were unlikely to outperform baseline models (p<0.05).
Model-centric monitoring emerged as the key signal for detecting degradation, supported by ablation analysis.

Abstract

Artificial intelligence (AI) plays an increasingly important role in maritime systems, enabling advanced monitoring, anomaly detection, and decision support. However, the reliability of such systems is challenged by distributional drift, which may significantly degrade model performance over time. While anomaly detection has been extensively studied in the context of data irregularities, considerably less attention has been devoted to detecting anomalies in AI model behaviour itself. In this study, we propose MARLIN-AD (Maritime AI Reliability and Learning Intelligence Network—Anomaly Detection), a dual-layer anomaly detection framework designed to jointly monitor anomalies in data streams and anomalies in model behaviour. The framework integrates data-centric detection methods with model-centric monitoring techniques, including distributional shift detection and prediction stability analysis, within a unified anomaly scoring mechanism. The evaluation is conducted using a fully controlled synthetic data generation process, enabling precise injection of anomalies and systematic simulation of distributional drift across multiple scenarios. Experimental results demonstrate a strong and consistent degradation of model performance under drift conditions. Statistical validation using non-parametric tests, permutation-based inference, and Bayesian bootstrap analysis confirms that the observed degradation is both statistically significant and practically meaningful. In particular, posterior distributions of performance differences indicate a near-zero probability that drifted configurations outperform the baseline model. The results highlight that model degradation under drift exhibits a consistent and structured pattern, reproducible across multiple independent random seeds. Furthermore, the study shows that model-centric monitoring provides the primary signal for detecting degradation—a finding corroborated by ablation analysis—while data-centric monitoring enhances interpretability and root-cause attribution. A pilot validation on publicly available Automatic Identification System (AIS) data from the Danish Maritime Authority confirms the applicability of the data-level component to real operational trajectories. The proposed framework contributes to the development of trustworthy AI systems by enabling comprehensive monitoring of both data integrity and model behaviour in dynamic environments.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper