Los puntos clave no están disponibles para este artículo en este momento.
Multi-view pedestrian detection and tracking in real deployments is often affected by missing views, partial occlusions, and image degradations, which break geometric consistency across cameras and destabilize BEV (bird’s-eye view) features. A key limitation of many existing BEV pipelines is that they implicitly assume all cameras are continuously available and equally reliable, making fused BEV features fragile when this assumption is violated. We propose a geometry-aware spatio-temporal fusion framework that improves BEV stability under degraded views. Specifically, view-weighted fusion down-weights weak or missing cameras in BEV aggregation, a coordinate-guided attention decoder reinforces spatial continuity and suppresses corrupted regions, and a ConvGRU-based temporal BEV state buffers short-term interruptions to stabilize detection and association. Experiments on WildTrack and MultiviewX demonstrate comparable performance under full-view inputs and more graceful degradation under camera dropouts and large occlusions, while maintaining stable behavior under mild-to-moderate noise perturbations.
Jiang et al. (Thu,) studied this question.