ABSTRACT Reliable crowd counting for IoT video analytics requires strong generalization across heterogeneous edge cameras. However, models trained on a labeled source domain often degrade on unseen cameras due to shifts in appearance, viewpoint, and density statistics. We propose MDANet, a deployment‐oriented framework for cross‐domain crowd counting that performs complementary alignment at three levels while keeping test‐time inference identical to a lightweight backbone. At the data level, Fourier Amplitude Mix reduces camera‐dependent style gaps by mixing low‐frequency amplitudes. At the feature level, global–local High‐Entropy Adversarial Regularization suppresses domain‐discriminative cues under spatial heterogeneity. At the domain level, Density‐Conditional Alignment modulates alignment strength according to predicted density to mitigate congestion‐dependent errors. Extensive experiments show that MDANet achieves competitive or state‐of‐the‐art accuracy with a favorable accuracy‐efficiency trade‐off, and additional evaluations under common stream degradations confirm its stability for edge deployment.
Bao et al. (Tue,) studied this question.