Summary Video analytics (VA) systems are becoming increasingly reliant on deep‐neural‐network‐based object detection, where video compression parameters such as resolution, bitrate, and quantization significantly affect inference accuracy. This paper presents a benchmarking study on five object detection models, namely Fast R‐CNN, EfficientDet, YOLOv5, YOLOv8, and DETR, which were evaluated using three encoder‐defined video quality levels (High, Medium, and Low). Performance was assessed using bitrate (Mbps), peak signal‐to‐noise ratio (PSNR, dB), and detection accuracy to provide a reproducible framework for analyzing compression‐induced performance variations. Experimental results reveal that PSNR scales approximately linearly with bitrate, whereas detection accuracy exhibits saturation with diminishing returns at higher bitrates. YOLOv5 exhibited the highest robustness to compression, followed by Fast R‐CNN and DETR, whereas EfficientDet and YOLOv8 were more sensitive to quality degradation. We identify practical operating points that balance accuracy and bandwidth efficiency, providing actionable guidance for model–codec co‐design in surveillance and smart‐city VA applications.
Masykuroh et al. (Tue,) studied this question.