Unmanned Aerial Vehicle (UAV), particularly rotary-wing platforms such as quadcopters and octocopters, has evolved from controlled remote sensing platforms into autonomous agents capable of active task execution. This evolution from collect-then-analyze workflows to closed-loop perception, reasoning, and action signifies a paradigm shift toward Embodied AI, unlocking opportunities for the low-altitude economy. However, current research on UAV Embodied AI (UAV-EAI) often implicitly frames the field as a direct extension of indoor robotics or autonomous driving, which overlooks the fundamental distinctions of aerial agents. To bridge this gap, we introduce a comparative framework contrasting UAV-EAI with Indoor-EAI and Autonomous Driving Embodied AI (AD-EAI). By systematically decomposing the domain into nine key dimensions, we (i) analyze core tasks such as perception, localization, and exploration; (ii) review enabling infrastructure, including simulators and datasets; and (iii) categorize modeling methods ranging from physics-centric control to cognition-centric models. Our analysis demonstrates that the convergence of 6-DoF motion space, kilometer-scale unstructured environments, and stringent on-device constraints establishes a research regime qualitatively different from ground-based agents. These factors significantly impede the migration of existing VLM/LLM-based embodied systems for UAVs. Finally, we summarize open challenges and outline promising directions for the next generation of UAV-EAI.
Zhao et al. (Mon,) studied this question.