The adoption of large language models (LLMs) in business operations has led to the proliferation of operational skills—codified, prompt-based instruction sets that LLMs execute repeatedly. As these skills grow in complexity through accumulated quality criteria and exception-handling rules,a practical challenge emerges: context bloat. This study provides an exploratory analysis of the relationship between context bloat and quality degradation in LLM operational skills, drawing on approximately four weeks of operational data from 22 skills at CC Company, a single-person organization, and controlled experiments comprising 220 trials. The observational study revealed a positive correlation between the number of referenced files and the maximum initial non-conformance count (Spearman’s ρ = 0.562). A controlled experiment using the canary instruction method confirmed that increasing context size from approximately 4,000 to 12,000 tokens was associated with a decline in instruction compliance rate (Cliff’s δ = −0.480). These results are integrated within a Design Science Research framework to derive seven tentative design guidelines (P1–P7). In particular,we propose the concept of separation of context—a decomposition criterion that uses whether processing resides inside or outside the LLM’s context window as the basis for design decisions—as a design principle distinct from the traditional software-engineering notion of “separation of concerns.”
Kenji Kuhara (Wed,) studied this question.