What question did this study set out to answer?

The essay argues that context selection is fundamentally a sufficiency problem in language models.

June 27, 2026Open Access

Context Selection Has Been a Solved Problem Since 1951

Key Points

The essay argues that context selection is fundamentally a sufficiency problem in language models.
Explains concepts in plain language related to context selection and compression methods.
References historical mathematicians David Blackwell and Lucien Le Cam to support the discussion.
Demonstrates how context selection affects the output of transformers.
Blackwell's theorem outlines the dominance of compression methods, establishing a partial order.
Introduces attention as a significant statistic in context selection.
Illustrates how vocabulary changes while the underlying objects remain constant.

Abstract

An expository companion to the paper "Context Selection as Approximate Sufficiency," written in plain language and without the paper's machinery. Every large language model has a context window, and the window is always too small; something must be left out, and the choosing is the whole game. The field solves this daily under many names — RAG, context engineering, long-context memory, prompt compression — and none of them cite the mathematician who settled the question in 1951. The essay makes the case that context selection is a sufficiency problem. David Blackwell's comparison of experiments (1951) gives the partial order over compression methods: every compressor is a garbling of the stream, and Blackwell's theorem says exactly when one garbling dominates another. Lucien Le Cam's deficiency (1964) gives the targeted measure of how much a selected context loses for the one decision that matters — emulating the frozen model's output. Attention is shown to be the statistic; prompt compression, the garbling. The objects are the same objects; only the vocabulary changes. This is the outer half of a two-part foundation, the inner half being the structure-function essay. It is meant to be readable by anyone who works with transformers, ahead of the formal paper.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper