June 23, 2025Open Access

From Words to Wisdom: LLMs Summarizing Instructional Content

Key Points

Key points are not available for this paper at this time.

Abstract

This study explores the effectiveness of large language models (LLMs) in summarizing instructional video transcriptions, a key application in educational technology. We assessed nine LLMs using two prompts—a simple base prompt and an enhanced, structured prompt—across 62 instructional videos. Two evaluating models, gpt-4o-mini and gemini-1.5-flash, scored the summaries based on seven criteria tailored to instructional content: overall structure, presence of examples, availability of sources, relevance, coherence, narration, and ACCURACY. Results showed notable performance differences, with models like Mistral Large and Claude 3.5 Sonnet performing best, especially with the enhanced prompt. However, the enhanced prompt improved narrative quality at the expense of structural clarity in some cases. Evaluator bias was also observed, with gpt-4o-mini assigning higher scores than gemini-1.5-flash, highlighting the need for multiple evaluators. These findings underscore the role of prompt design and model choice in educational LLM applications and suggest future research into optimizing prompts and standardizing evaluation methods.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper