We introduce a real-time closed-loop protocol in which each LLM generation isconditioned on a persistent ledger of binary accept/reject judgments accumulatedwithin a session. The ledger state is updated by gradient descent on the binarylabels, yielding a compact operator state that converges from cold start andtransfers across four architecturally distinct LLMs — 19 of 20 independent runslearn the same preference directions. These findings suggest that persistentin-session conditioning can serve as a complementary interface-level mechanismto offline preference optimization, enabling real-time adaptation of generationdynamics without model weight updates
Noah Damiani (Tue,) studied this question.