What question did this study set out to answer?

April 24, 2026Open Access

Context Bloat and Quality Degradation in LLM Operational Skills: An Exploratory Analysis Based on Observational Data and Controlled Experiments

KKKenji KuharaHeisei International University

Key Points

This analysis aims to explore the relationship between context bloat and quality degradation in LLM operational skills.
Analyzed approximately four weeks of operational data from 22 skills at a single-person organization.
Conducted 220 controlled trials using the canary instruction method to assess impact on compliance rates.
Integrated findings within a Design Science Research framework to derive design guidelines.
Found a positive correlation between the number of referenced files and initial non-conformance count (Spearman’s ρ = 0.562).
Demonstrated that increasing context size from 4,000 to 12,000 tokens led to reduced compliance rates (Cliff's δ = −0.480).
Proposed seven design guidelines to mitigate context bloat's effects on operational skills.

Abstract

The adoption of large language models (LLMs) in business operations has led to the proliferation of operational skills—codified, prompt-based instruction sets that LLMs execute repeatedly. As these skills grow in complexity through accumulated quality criteria and exception-handling rules,a practical challenge emerges: context bloat. This study provides an exploratory analysis of the relationship between context bloat and quality degradation in LLM operational skills, drawing on approximately four weeks of operational data from 22 skills at CC Company, a single-person organization, and controlled experiments comprising 220 trials. The observational study revealed a positive correlation between the number of referenced files and the maximum initial non-conformance count (Spearman’s ρ = 0.562). A controlled experiment using the canary instruction method confirmed that increasing context size from approximately 4,000 to 12,000 tokens was associated with a decline in instruction compliance rate (Cliff’s δ = −0.480). These results are integrated within a Design Science Research framework to derive seven tentative design guidelines (P1–P7). In particular,we propose the concept of separation of context—a decomposition criterion that uses whether processing resides inside or outside the LLM’s context window as the basis for design decisions—as a design principle distinct from the traditional software-engineering notion of “separation of concerns.”

KI fragen

Bookmark

View Full Paper