Large Language Models (LLMs) aligned using Reinforcement Learning from Human Feedback (RLHF) are learning what to say from discrete, voluntary preference judgments, but not how their communication lands. The LLM does not know how different answers affect a listener’s cognition and emotion in real time, regardless of how intelligent the model is on benchmark tests. This becomes a major gap in developing trust programmatically. This communication gap can be closed by introducing a promising signal for enriching temporal information in the RLHF training process: the brain electroencephalography (EEG) measurement. EEG’s high temporal resolution makes it especially well-suited for the purpose of contextualizing time series data, with multiple data points per second making within-response changes in listener states potentially observable. Most EEG foundation models today have been developed as general EEG representation learners for downstream decoding tasks rather than as alignment systems for LLM models. The field still lacks large, ecologically valid datasets that couple natural conversation with time-resolved cognitive-state labels. Against that background, we propose Reinforcement Learning from Brain Feedback (RLbF), an LLM post-training framework that uses calibrated cognitive-state predictors to convert decoded EEG signals into a continuous, involuntary, noisy reward source for language-model adaptation. RLbF formalizes communication as a Partially Observable Markov Decision Process and defines a three-component reward function combining prediction accuracy, cognitive resonance, and application-layer objectives. A three-phase training pipeline progresses from supervised fine-tuning through prediction model calibration to reinforcement learning with multidimensional empathic reward. We hypothesize that responsible RLbF deployment could also act as a data engine, with real-world conversational use generating aligned neuro-conversational traces that later support improved cognitive-state decoders and future EEG foundation models optimized for interactive environments. We hypothesize that models trained with brain-based reward signals may acquire communication skills that persist even when EEG is no longer available at inference time. If so, brain feedback could serve as a training signal for more empathic and effective language models without requiring end users to wear EEG hardware during deployment at scale. We instantiate this framework in a proof-of-concept platform, Isaac, which implements the proposed closed-loop cognitive feedback architecture for experimental study. We also outline an initial evaluation protocol designed to support pre-registered testing and to examine key ethical questions, including the boundary between empathic and persuasive computing.
Building similarity graph...
Analyzing shared references across papers
Loading...
Daniel Furman
Eitan Kay
Kogan Ben
Activated Research Company (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Furman et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69fbe2b3164b5133a91a21cf — DOI: https://doi.org/10.5281/zenodo.20043908