Abstract Underwater video transects are crucial to assess marine biodiversity and biomass. Counting fish in these videos is labour‐ and time‐intensive and operator‐dependent. Automating this step would create non‐biased biodiversity data and decouple the data collection campaign from any field constraints. To develop an automated process, we compared the standard video counting method with three new computer vision–based approaches using single‐frame detections that resulted in a holistic and fully automated pipeline for fish abundance measurements. In addition to the commonly used method N max , we included (i) a 1D k‐means clustering method, (ii) an intuitive clustering approach, N Heuristic , and (iii) a Temporal Convolutional Neural Network (TCN) counting method. We tested these methods on three Mediterranean species from different ecological niches. We assessed the methods using manually labelled detections (groundtruth detections) and then incorporated a detector into the pipeline for a more realistic scenario. Results showed a consistent underestimation by N max . The other methods performed better overall, with N Heuristic and TCN methods closest to manual evaluation. With an absolute variation comparable to inter‐operator variation, we demonstrated that these are reliable methods for quantifying fish counts for these three different Mediterranean species. The parameters of these automated methods could be adjusted to suit other species and then be used in monitoring programmes, for example to assess biodiversity and biomass in marine protected areas over time.
Bürgi et al. (Thu,) studied this question.