What question did this study set out to answer?

The research aims to assess how close current multimodal large language models come to achieving human-level planning abilities.

February 14, 2026Open Access

EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning

Key Points

The research aims to assess how close current multimodal large language models come to achieving human-level planning abilities.
Introduced EgoPlan-Bench as a benchmark for evaluating planning skills of multimodal large language models.
Designed realistic tasks and diverse action plans for the evaluation.
Utilized an egocentric perspective to mirror human perception in complex scenarios.
EgoPlan-Bench reveals significant challenges for multimodal large language models in task planning.
Evaluation indicates a substantial scope for improvement in MLLMs to reach human-level planning.
EgoPlan-IT, an instruction-tuning dataset, enhances model performance on the benchmark.

Abstract

Abstract The pursuit of artificial general intelligence (AGI) has been accelerated by Multimodal Large Language Models (MLLMs), which exhibit superior reasoning, generalization capabilities, and proficiency in processing multimodal inputs. A crucial milestone in the evolution of AGI is the attainment of human-level planning, a fundamental ability for making informed decisions in complex environments, and solving a wide range of real-world problems. Despite the impressive advancements in MLLMs, a question remains: How far are current MLLMs from achieving human-level planning? To shed light on this question, we introduce EgoPlan-Bench, a comprehensive benchmark to evaluate the planning abilities of MLLMs in real-world scenarios from an egocentric perspective, mirroring human perception. EgoPlan-Bench emphasizes the evaluation of planning capabilities of MLLMs, featuring realistic tasks, diverse action plans, and intricate visual observations. Our rigorous evaluation of a wide range of MLLMs reveals that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning. To facilitate this advancement, we further present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench. We have made all the codes, data, and a maintained benchmark leaderboard available at https: //chenyi99. github. io/egoₚlan/ to advance future research.

AI से पूछें

Bookmark

View Full Paper