Programming learning environments generate rich interaction data through iterative code submissions and automated evaluation processes. This article presents CodeStream , a dataset of programming submissions collected from undergraduate computer science students during supervised problem-solving sessions using an automated assessment platform. The dataset contains 5,482 submissions from 202 users across 46 programming problems written in C, C++, and Java. Each submission record includes source code, programming language, final evaluation verdict, attempt order, and sequential verdict traces generated during test case evaluation. A linked problem-level component provides problem descriptions and associated evaluation test cases. The dataset preserves temporal relationships between users, problems, and attempts, enabling reconstruction of submission histories and analysis of iterative problem-solving behavior. CodeStream supports research in educational data mining, learning analytics, automated feedback systems, code analysis, and programming behavior modeling. Its attempt-level structure is particularly suitable for studying error correction patterns, learning progression, and sequential decision-making in novice programming contexts.
Lina et al. (Fri,) studied this question.