Managing water losses effectively is essential for the financial sustainability of water utilities, which requires rapid detection of abnormal water consumption. This paper presents a framework for detecting apparent losses caused by meter bypass and malfunctions using state-of-the-art models for tabular data. Three Gradient Boosted Decision Trees (GBDTs) were compared in a univariate and a hybrid dataset, examining the impact of using time-series features in isolation and in combination with registration information. Different resampling approaches and a cost-sensitive loss function were compared to avoid resampling, due to concerns about noisy data and overfitting. The proposed methodology was evaluated using bootstrap techniques to assess statistical significance. Results showed that the combination of time-series and registration features significantly enhances model performance (up to 19% in F1-score), while the cost-sensitive loss function performs better than resampling methods in some metrics, without introducing resampling artifacts. The approach was tested with real water consumption data and demonstrated that the framework can reduce the number of inspections to achieve equivalent detection rates by up to 1.7 times relative to the Logistic Regression baseline and up to 7.6 times relative to a random baseline. • An inspection-oriented framework is proposed for abnormal water consumption detection. • Cost-sensitive learning outperforms resampling methods while preserving data distribution. • A state-of-the-art model can be improved even with low-frequency data. • Bootstrap evaluation reveals robust inspection efficiency under prior probability shifts. • Ranking-based inspection reduces field inspections by up to 7.6 × to detect 50% of abnormal cases.
Chaves et al. (Wed,) studied this question.