Key points are not available for this paper at this time.
Abstract Embodied Intelligence refers to the agent interacting with the environment, perceiving, planning, decision-making, and executing like humans, which is applicable in smart homes, drone inspections, and other domains. Embodied task planning is one of the main tasks of embodied intelligence, which generates detailed step-by-step plans while perceiving the surrounding environment and understanding language instruction. Visual-language models, with powerful multimodal representation capabilities, have been generalized to various tasks. When applied to embodied task planning, it still faces the following two challenges. Firstly, the intricate complexity of the environment leads to difficulties in global environment information modeling. Secondly, frequent turns in task paths result in the dependence on strong spatial reasoning ability. To overcome these challenges, we propose PlanAgent, the first embodied visual-language model for embodied task planning. Specifically, the environment map is employed to model the global environment information. Then we present the environment map encoder to extract task-related information from the environment. Further, to reduce task path planning's dependence on strong spatial reasoning, we introduce the self-posture-aware training strategy to break down long-term spatial reasoning into short-term. We build the EmbodiedPlan-20k dataset for grounded planning in embodied tasks. Our experiments on the dataset demonstrate that PlanAgent outperforms previous methods and all components are effective.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yuanchang Yue
Fanglong Yao
Youzhi Liu
Chinese Academy of Sciences
University of Chinese Academy of Sciences
Aerospace Information Research Institute
Building similarity graph...
Analyzing shared references across papers
Loading...
Yue et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68e64297b6db6435875d4388 — DOI: https://doi.org/10.21203/rs.3.rs-4513731/v1