What question did this study set out to answer?

This study aims to assess if large language models can independently complete tasks in an undergraduate financial accounting course.

July 1, 2026

Advances in Generalized Large Language Models: Performance on Undergraduate Financial Accounting Tasks

Key Points

This study aims to assess if large language models can independently complete tasks in an undergraduate financial accounting course.
Evaluated three large language models: ChatGPT 5, Gemini 2.5, and Claude Sonnet 4.5.
Models performed assignments, exams, and projects without human prompting at Lamar University.
Comparison of model accuracy to student performance in ACCT 2301.
All three models achieved final course averages exceeding 96%, while the student average was 78.4.
Near-perfect accuracy was observed on structured tasks across models.
Weaknesses in model performance were noted on the multi-step Jackson Cycle project, highlighting reasoning limitations.

Abstract

This study evaluates whether generalized large language models (LLMs) can independently complete all tasks required in an undergraduate financial accounting course. Three leading LLMs, including ChatGPT 5, Gemini 2.5, and Claude Sonnet 4.5, performed every assignment, exam, and project in ACCT 2301 at Lamar University without human prompting beyond the attachment of required work. Results show near-perfect accuracy on structured tasks, with all three models achieving final course averages exceeding 96%, compared to the student average of 78.4. Performance weaknesses emerged on the multi-step Jackson Cycle project, aligning with prior Artificial Intelligence (AI) literature on long-horizon reasoning limitations. Broader implications for accounting education, labor markets, assessment design, and AI governance are discussed.

KI fragen

Bookmark