The rapid proliferation of GPU-free consumer edge robots—including household service units, smart vacuum cleaners, and autonomous indoor logistics platforms—demands navigation intelligence capable of operating under strict on-device computational constraints and severely limited perceptual access. Existing approaches suffer from a fundamental dichotomy: classical local planners achieve near-perfect success rates but require exhaustive internal state bookkeeping incompatible with stateless embedded execution, whereas neural reinforcement learning (RL) agents exhibit efficient perception–action loops yet catastrophically fail in non-convex obstacle fields through an overconfidence trap—a confident but physically stuck state invisible to conventional policy-entropy-based monitors. We present a decoupled RL–LLM dual-brain co-navigation framework that provides a computationally efficient representation for the cognitive planning tier of ARM-class consumer edge robots. Three tightly integrated innovations drive the system: (i) a black-box spatio-temporal stuck detector based solely on physical displacement monitoring, which consistently detects the overconfidence trap that entropy triggers miss; (ii) an ASCII topology serialisation interface that maps local radar observations to structured text grids natively aligned with the pretraining priors of code-specialised small language models, decoupling the cognitive tier from continuous low-level controllers; and (iii) a bottom-up fault-tolerant architecture augmented by a Trap Repulsion Field, wherein the RL physical layer absorbs top-level large language model (LLM) spatial hallucinations, enabling system-level robustness without requiring an infallible oracle. Evaluated across five difficulty levels against seven baselines—including artificial potential fields (APF), Pure Deep Q-Network (DQN), pure Proximal Policy Optimisation (PPO), and three local graph-search methods—over 50 independent start–goal episodes per level (1,750 total evaluations), our framework achieves the highest success rate among all learning-based and reactive methods at 95% averaged across all levels, versus 72% for pure PPO, 9% for pure DQN, and 5% for APF, with a mean path cost of 50 steps on successful episodes. An edge deployment experiment on an Orange Pi board with a locally hosted 1.2B-parameter LLM (LFM-2.5) confirms 90% success on the hardest map with a mean inference latency of 37.82 s per call, establishing the viability of on-device cognitive-tier planning on mass-market, GPU-free consumer hardware.
Cai et al. (Mon,) studied this question.