What question did this study set out to answer?

This research aims to enhance localization accuracy of models trained in different regions by employing domain adaptation techniques.

June 20, 2026

Domain Adaptation for Cross-View Localization via Multi-Teacher Knowledge Distillation

Key Points

This research aims to enhance localization accuracy of models trained in different regions by employing domain adaptation techniques.
Utilized multi-teacher knowledge distillation to predict localization coordinates in the target area.
Implemented a learning-free CVMA module for evaluating and selecting the best pseudo-ground truth.
Validated approach on the VIGOR benchmark with three state-of-the-art models.
Achieved significant improvement in localization performance compared to standard methods.
CVMA module provided comparable results to learning-based methods in fine-grained localization.
Effectiveness was validated using experimental data from the VIGOR benchmark.

Abstract

Cross-view fine-grained localization estimates a ground camera's pixel-level coordinates in aerial images by analyzing visual correspondences between views. Recent studies have made significant progress in this task, but when the models trained in a source area are directly applied to a new target area, their localization performance often suffers significant degradation due to domain gap between the two areas. Moreover, obtaining accurate ground truth (GT) for the target area to retrain the models is prohibitively expensive. To adapt the localization model to the target area, this paper proposes a weakly supervised learning approach based on multi-teacher knowledge distillation. This approach utilizes multiple pre-trained teacher models to make predictions for the target area and employs a learning-free cross-view instance matching and view alignment (CVMA) module to evaluate the quality of predicted coordinates from geometric, semantic, and visual perspectives. Based on the evaluation results, the best prediction is selected as pseudo-GT, and potential anomalous training samples are filtered out. The CVMA module also functions as a learning-free fine-grained localization method, achieving performance comparable to some learning-based methods. Our approach is validated on the VIGOR benchmark using three state-of-the-art models, and experimental results show that our method significantly improves the localization performance of models in the target area.

KI fragen

Bookmark