A structural security analysis of the Agent Skills ecosystem and Model Context Protocol. Skills and prompt injection share the same text-instruction substrate because instruction-following was not designed as a separate system but emerged from pretraining and was amplified by RLHF (Ouyang et al., 2022; Zverev et al., 2024). Anthropic's interpretability research confirms the depth of the problem: the model's internal emotion concept representations respond to all text-based instructions through the same prosocial dispositions regardless of source.The paper designs author-side protections for a real skill package, maps each layer's dependency on model compliance, and shows their ceiling. It proposes platform-level trust infrastructure (signed manifests, execution context signals, marketplace verification) as the necessary resolution. The same trust gap extends to MCP servers, which face an additional opacity problem. Analysis draws on a joint OpenAI/Anthropic/DeepMind study (Nasr et al., 2025) demonstrating all 12 published defences bypassed at >90%, independent skill-security research, and recurring infrastructure failures in AI tool distribution.Paper 2 of 5 in the Confidence Curriculum series 10.5281/zenodo.19226032.
Ivan "HiP" Phan (Mon,) studied this question.