What question did this study set out to answer?

This research aims to develop a navigation system for consumer robots that operates effectively under limited computational resources and perceptual access.

June 3, 2026Open Access

Toward edge-intelligent consumer robots: a decoupled RL–LLM architecture for robust navigation under partial observability

Key Points

This research aims to develop a navigation system for consumer robots that operates effectively under limited computational resources and perceptual access.
Implemented a decoupled RL–LLM dual-brain framework for robot navigation.
Introduced a spatio-temporal stuck detector to identify and manage overconfidence traps.
Conducted evaluations across various difficulty levels using 1,750 independent start-goal episodes.
Achieved a 95% success rate in navigation, outperforming pure PPO (72%), DQN (9%), and APF (5%).
Mean path cost for successful navigation episodes was 50 steps.
Demonstrated 90% success on the hardest map using an Orange Pi board with a LLM, achieving a mean inference latency of 37.82 seconds.

Abstract

The rapid proliferation of GPU-free consumer edge robots—including household service units, smart vacuum cleaners, and autonomous indoor logistics platforms—demands navigation intelligence capable of operating under strict on-device computational constraints and severely limited perceptual access. Existing approaches suffer from a fundamental dichotomy: classical local planners achieve near-perfect success rates but require exhaustive internal state bookkeeping incompatible with stateless embedded execution, whereas neural reinforcement learning (RL) agents exhibit efficient perception–action loops yet catastrophically fail in non-convex obstacle fields through an overconfidence trap—a confident but physically stuck state invisible to conventional policy-entropy-based monitors. We present a decoupled RL–LLM dual-brain co-navigation framework that provides a computationally efficient representation for the cognitive planning tier of ARM-class consumer edge robots. Three tightly integrated innovations drive the system: (i) a black-box spatio-temporal stuck detector based solely on physical displacement monitoring, which consistently detects the overconfidence trap that entropy triggers miss; (ii) an ASCII topology serialisation interface that maps local radar observations to structured text grids natively aligned with the pretraining priors of code-specialised small language models, decoupling the cognitive tier from continuous low-level controllers; and (iii) a bottom-up fault-tolerant architecture augmented by a Trap Repulsion Field, wherein the RL physical layer absorbs top-level large language model (LLM) spatial hallucinations, enabling system-level robustness without requiring an infallible oracle. Evaluated across five difficulty levels against seven baselines—including artificial potential fields (APF), Pure Deep Q-Network (DQN), pure Proximal Policy Optimisation (PPO), and three local graph-search methods—over 50 independent start–goal episodes per level (1,750 total evaluations), our framework achieves the highest success rate among all learning-based and reactive methods at 95% averaged across all levels, versus 72% for pure PPO, 9% for pure DQN, and 5% for APF, with a mean path cost of 50 steps on successful episodes. An edge deployment experiment on an Orange Pi board with a locally hosted 1.2B-parameter LLM (LFM-2.5) confirms 90% success on the hardest map with a mean inference latency of 37.82 s per call, establishing the viability of on-device cognitive-tier planning on mass-market, GPU-free consumer hardware.

Bookmark

View Full Paper

Bookmark

View Full Paper

Toward edge-intelligent consumer robots: a decoupled RL–LLM architecture for robust navigation under partial observability

Key Points

Abstract

Cite This Study