Key points are not available for this paper at this time.
Predicting student outcomes such as course completion or dropout is central to early intervention in online education. In this context, learning analytics systems must extract actionable insights from noisy, sequential, and weakly labeled interaction data. We revisit Reinforcement Learning-based Multiple Instance Learning (RL-MIL), a framework that learns a policy to select critical instances from bags of interactions while optimizing a downstream predictor. However, prior RL-MIL approaches rely on the Epsilon-Greedy strategy, which under-explores context and requires expensive post-hoc explanations (e.g., SHAP) to justify instance selection.We propose two attention-based RL-MIL policies, Gated Attention and Multi-Head Attention, that directly embed context-aware selection into the policy network, providing intrinsic interpretability at inference time. Experiments on two versions of the Open University Learning Analytics Dataset (OULAD) show that while both attention-based methods match the predictive performance of the Epsilon-Greedy baseline (0.92 Macro F1 on the aggregated dataset; 0.80 on the full dataset), they reduce interpretation latency by up to two orders of magnitude and allow tunable trade-offs between sparsity and accuracy via attention-temperature control.This work establishes the feasibility of analytical, efficient, and context-aware instance selection in weakly labeled educational settings, and provides a path toward scalable, interpretable RL-MIL frameworks that do not depend on costly post-hoc methods.
Braakman et al. (Sat,) studied this question.