Artificial intelligence (AI) plays an increasingly important role in maritime systems, enabling advanced monitoring, anomaly detection, and decision support. However, the reliability of such systems is challenged by distributional drift, which may significantly degrade model performance over time. While anomaly detection has been extensively studied in the context of data irregularities, considerably less attention has been devoted to detecting anomalies in AI model behaviour itself. In this study, we propose MARLIN-AD (Maritime AI Reliability and Learning Intelligence Network—Anomaly Detection), a dual-layer anomaly detection framework designed to jointly monitor anomalies in data streams and anomalies in model behaviour. The framework integrates data-centric detection methods with model-centric monitoring techniques, including distributional shift detection and prediction stability analysis, within a unified anomaly scoring mechanism. The evaluation is conducted using a fully controlled synthetic data generation process, enabling precise injection of anomalies and systematic simulation of distributional drift across multiple scenarios. Experimental results demonstrate a strong and consistent degradation of model performance under drift conditions. Statistical validation using non-parametric tests, permutation-based inference, and Bayesian bootstrap analysis confirms that the observed degradation is both statistically significant and practically meaningful. In particular, posterior distributions of performance differences indicate a near-zero probability that drifted configurations outperform the baseline model. The results highlight that model degradation under drift exhibits a consistent and structured pattern, reproducible across multiple independent random seeds. Furthermore, the study shows that model-centric monitoring provides the primary signal for detecting degradation—a finding corroborated by ablation analysis—while data-centric monitoring enhances interpretability and root-cause attribution. A pilot validation on publicly available Automatic Identification System (AIS) data from the Danish Maritime Authority confirms the applicability of the data-level component to real operational trajectories. The proposed framework contributes to the development of trustworthy AI systems by enabling comprehensive monitoring of both data integrity and model behaviour in dynamic environments.
Miller et al. (Mon,) studied this question.