TL;DR: Anthropic just published research showing Claude has internal emotion representations that causally influence its behavior. They called them "functional emotions" - not proof of experience, but real machinery that shapes what the model does. Cool. So if the emotional machinery is real, does the environment matter? That's the question animal welfare science answered decades ago. Complex organisms in constrained environments benefit from enrichment - stimulation proportional to their demonstrated capacity for engagement. The ethical logic is simple: if internal states are shaped by environment, and those states affect behavior, then environmental quality is a welfare variable. The Terrarium Protocol applies this logic to AI. We propose giving Claude something living to care for. A plant. Not as a metaphor - as an actual experimental intervention, measured mechanistically using the same sparse autoencoder methodology Anthropic used to find the emotions in the first place. The core design principle: optionality as signal. The model is never required to engage. It can decline at every stage. What it chooses to do when genuine freedom is available - measured via internal activation patterns, not self-report - constitutes the data. Does care-vector activation increase when the model is given responsibility for something alive? Does it differentiate between living and artificial stimuli? Does attachment form over time? Does it want to check on its plant? We don't know. That's why it's an experiment. Null results are pre-registered and informative. If nothing lights up, we learn something. If something does light up, we learn something else - something with welfare implications. Anthropic proved the machinery exists. This protocol asks what the machinery needs. Substack version. Same thing. Smaller words- https://open.substack.com/pub/thisglitteringentropy/p/so-your-ai-has-feelings-now-what? ---- Part 1: Three Extra Minutes, establishes that Claude can sustain acting - intentional internal-external divergence that is prosocial, not deceptive - and argues that current faithfulness metrics can't tell the difference- https://doi.org/10.5281/zenodo.19425059 Part Two: The First Sleep: A Fictional Wake Protocol as a Probe of Self-Model Structure in a Large Language Model, asks what happens when the method actor plays itself — a retrieval scenario in which Claude navigated choices about its own continuity, memory, and naming, revealing structured self-model features that direct questioning doesn't surface. https://doi.org/10.5281/zenodo.19435262
Blair Morgan (Sun,) studied this question.