What type of study is this?

This is a Quantitative Study study.

September 20, 2025Open Access

Educational Evaluation with MLLMs: Framework, Dataset, and Comprehensive Assessment

Key Points

MLLMs show good consistency with human evaluations in educational assessments.
The constructed multimodal dataset includes student essays, slide decks, and videos annotated by experts.
Four leading models evaluated display strengths across various assessment dimensions.
High explainability of MLLMs is noted, although scoring stability needs improvement.

Abstract

With the rapid development of Multimodal Large Language Models (MLLMs) in education, their applications have mainly focused on content generation tasks such as text writing and courseware production. However, automated assessment of non-exam learning outcomes remains underexplored. This study shifts the application of MLLMs from content generation to content evaluation and designs a lightweight and extensible framework to enable automated assessment of students’ multimodal work. We constructed a multimodal dataset comprising student essays, slide decks, and presentation videos from university students, which were annotated by experts across five educational dimensions. Based on horizontal educational evaluation dimensions (Format Compliance, Content Quality, Slide Design, Verbal Expression, and Nonverbal Performance) and vertical model capability dimensions (consistency, stability, and interpretability), we systematically evaluated four leading multimodal large models (GPT-4o, Gemini 2.5, Doubao1.6, and Kimi 1.5) in assessing non-exam learning outcomes. The results indicate that MLLMs demonstrate good consistency with human evaluations across various assessment dimensions, with each model exhibiting its own strengths. Additionally, they possess high explainability and perform better in text-based tasks than in visual tasks, but their scoring stability still requires improvement. This study demonstrates the potential of MLLMs for non-exam learning assessment and provides a reference for advancing their applications in education.

Educational Evaluation with MLLMs: Framework, Dataset, and Comprehensive Assessment

Key Points

Abstract

Cite This Study

Also Consider

Also Consider