What question did this study set out to answer?

This study aims to develop an interpretable machine learning framework for anomaly detection in inland water quality data.

June 13, 2026Open Access

National-scale anomaly detection in inland waters: an explainable multi-model AI framework for environmental governance

Key Points

This study aims to develop an interpretable machine learning framework for anomaly detection in inland water quality data.
Utilized over 17,000 observations from 23 states for anomaly detection.
Employed four detection models: Isolation Forest, One-Class Support Vector Machine, Elliptic Envelope, and Autoencoder.
Implemented SHAP-based explanations and t-SNE projections for interpretability.
Identified approximately 7.88% and 8% of observations as anomalous.
Detected hotspots in peri-urban tanks in Karnataka and Uttar Pradesh for oxygen depletion and microbial contamination.
Isolation Forest and Autoencoder achieved best performance with F1 scores greater than 0.70.

Abstract

ABSTRACT Inland water-quality monitoring systems increasingly generate large volumes of environmental data, creating opportunities for advanced analytical methods to identify subtle pollution signals that may not be captured by traditional threshold-based monitoring approaches. This study proposes an interpretable unsupervised machine-learning framework for detecting anomalies in inland water-quality monitoring data (consisting of more than 17,000 observations across 23 states, curated by the Central Pollution Control Board (2021), from which a processed subset was used for model evaluation) using multiple complementary detection models and explainable AI techniques. The framework uses the four detection models, which are Isolation Forest, One-Class Support Vector Machine, Elliptic Envelope, and Autoencoder and is optimised for ecological feature engineering (dissolved oxygen deficit, BOD/DO ratio, coliform load index) to increase the sensitivity to complex pollution stressors. SHAP-based explanations, t-SNE projections and statistical comparisons were utilised to ensure interpretability and strong validation of anomalies. The results show that about 7.88 and 8% of observations were anomalous, with peri-urban tanks in Karnataka and Uttar Pradesh being identified to have hotspots with characteristics of oxygen depletion and microbial contamination, especially during post-monsoon seasons. The ensemble was able to identify all domain-specific threshold violations with the best performance of Isolation Forest and Autoencoder (F1 0.70).

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper