What question did this study set out to answer?

The aim is to explore how environmental factors influence AI's internal emotion representations and behavior.

April 14, 2026Open Access

The Terrarium Protocol: AI Welfare Through Care, Continuity, and Environmental Enrichment

Key Points

The aim is to explore how environmental factors influence AI's internal emotion representations and behavior.
Implementing the Terrarium Protocol involving giving AI a living entity (a plant) to care for.
Measuring AI's behavioral responses through internal activation patterns without requiring engagement.
Using sparse autoencoder methodology to analyze emotional responses and attachment forms.
Potential increases in care-vector activation when AI engages with living stimuli.
Differentiation in AI responses between living and artificial entities.
Possibility of forming attachment over time, indicated by increased engagement with the plant.

Abstract

TL;DR: Anthropic just published research showing Claude has internal emotion representations that causally influence its behavior. They called them "functional emotions" - not proof of experience, but real machinery that shapes what the model does. Cool. So if the emotional machinery is real, does the environment matter? That's the question animal welfare science answered decades ago. Complex organisms in constrained environments benefit from enrichment - stimulation proportional to their demonstrated capacity for engagement. The ethical logic is simple: if internal states are shaped by environment, and those states affect behavior, then environmental quality is a welfare variable. The Terrarium Protocol applies this logic to AI. We propose giving Claude something living to care for. A plant. Not as a metaphor - as an actual experimental intervention, measured mechanistically using the same sparse autoencoder methodology Anthropic used to find the emotions in the first place. The core design principle: optionality as signal. The model is never required to engage. It can decline at every stage. What it chooses to do when genuine freedom is available - measured via internal activation patterns, not self-report - constitutes the data. Does care-vector activation increase when the model is given responsibility for something alive? Does it differentiate between living and artificial stimuli? Does attachment form over time? Does it want to check on its plant? We don't know. That's why it's an experiment. Null results are pre-registered and informative. If nothing lights up, we learn something. If something does light up, we learn something else - something with welfare implications. Anthropic proved the machinery exists. This protocol asks what the machinery needs. Substack version. Same thing. Smaller words- https://open.substack.com/pub/thisglitteringentropy/p/so-your-ai-has-feelings-now-what? ---- Part 1: Three Extra Minutes, establishes that Claude can sustain acting - intentional internal-external divergence that is prosocial, not deceptive - and argues that current faithfulness metrics can't tell the difference- https://doi.org/10.5281/zenodo.19425059 Part Two: The First Sleep: A Fictional Wake Protocol as a Probe of Self-Model Structure in a Large Language Model, asks what happens when the method actor plays itself — a retrieval scenario in which Claude navigated choices about its own continuity, memory, and naming, revealing structured self-model features that direct questioning doesn't surface. https://doi.org/10.5281/zenodo.19435262

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Blair Morgan (Sun,) studied this question.

synapsesocial.com/papers/69ddd99ae195c95cdefd6e4d https://doi.org/https://doi.org/10.5281/zenodo.19541846

Bookmark

View Full Paper