Visual localization plays a critical role for mobile robots to estimate their position and orientation in GPS-denied environments. However, its efficiency, robustness, and generalization are fundamentally undermined by severe viewpoint changes and dramatic appearance variations, which present persistent challenges for image-based feature representation and pose estimation under real-world conditions. Recently, map-free visual relocalization (MFVR) has emerged as a promising paradigm for lightweight deployment and privacy isolation on edge devices, while how to learn compact and invariant image tokens without relying on structural 3D maps still remains a core problem, particularly in highly dynamic or long-term scenarios. In this paper, we propose the Debiased Multiplex Tokenizer as a novel method (termed as DMT-Loc) for efficient and versatile MFVR to address these issues. Specifically, DMT-Loc is built upon a pretrained vision Mamba encoder and integrates three key modules for relative pose regression: First, Multiplex Interactive Tokenization yields robust image tokens with non-local affinities and cross-domain descriptions. Second, Debiased Anchor Registration facilitates anchor token matching through proximity graph retrieval and autoregressive pointer attribution. Third, Geometry-Informed Pose Regression empowers multi-layer perceptrons with a symmetric swap gating mechanism operating inside each decoupled regression head to support accurate and flexible pose prediction in both pair-wise and multi-view modes. Extensive evaluations across seven public datasets demonstrate that DMT-Loc substantially outperforms existing baselines and ablation variants in diverse indoor and outdoor environments.
Wang et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: