This paper proposes a novel Large Language Model (LLM)-based visual target navigation framework for quadrotors in unknown environments. Leveraging the semantic knowledge of LLMs, our method enables autonomous exploration based on natural language instructions. We design an intelligent planner using specialized prompt templates that operates in two phases: first, deriving global search sequences via probabilistic inference; second, dynamically generating sub-goal waypoints by fusing visual observations with statistical priors and LLM-derived scene relevance metrics. The quadrotor then executes a progressive search via path planning algorithms. Simulation results indicate that our fused method outperforms single-modality baselines by approximately 20%. Furthermore, physical flight experiments demonstrate success rates of 56% in Cross-layout and 48% in T-shaped layout scenarios. These results, while reflecting the inherent challenges of perceptual occlusion and planning uncertainty, validate the feasibility and potential of the proposed framework in real-world applications.
Liu et al. (Sat,) studied this question.