Reinforcement learning agents are evaluated almost exclusively by cumulative reward — a proxythat Skalse et al. NeurIPS 2022 prove is mathematically hackable for any non-constant objective.Gao et al. ICML 2023 document the empirical consequence: proxy reward rises while trueperformance peaks and falls, a divergence invisible to any system tracking only the reward curve.No standard evaluation tool diagnoses this gap post-hoc, per-agent, without modifying the trainingpipeline. We introduce LearnLens, a Python package computing a Learning Quality Score (LQS)— a composite behavioral metric decomposing agent behavior into four probes grounded in theGoodhart taxonomy: Generalization (G), Consistency (C), Hack Index (H), and ReasoningAlignment (R), combined as LQS = sqrt(G x C) x (1 - sqrt(H)) + 0.15 x R x (1 - sqrt(H)), cappedat 1.0. In a controlled three-agent experiment, LQS correctly ranked agents by true quality wherecumulative reward did not. In a GRPO experiment using Qwen2.5-3B-Instruct on a T4 GPU over500 steps, an LQS-inspired penalty reduced Hack Index from 1.00 to 0.00 and raised LQS from0.000 to 0.848, while cumulative reward increased only 46.5%. LearnLens is pip-installable (pipinstall learnlens-rl), compatible with Gymnasium, Stable-Baselines3, and the OpenEnv ecosystem,and fully open-source.
Ajay Bandiwaddar (Tue,) studied this question.