The first empirical study of multi-server Model Context Protocol (MCP) orchestration with a 7-model cross-domain synthesis benchmark. Seventeen real MCP tool calls across six servers (arXiv, PubMed, Firecrawl, Context7, Memory, Filesystem) produced nine cross-domain insights. Seven LLMs were benchmarked (GPT-5.4, DeepSeek R1, Mistral Large 3, Llama 4 Maverick, Gemini 2.5 Flash, Claude Sonnet 4.5, Claude Haiku 4.5) on identical data. All seven independently identified the mechanism-pattern gap: composition patterns for multi-server MCP are undocumented. Five patterns were proposed.
Arif Dogan (Mon,) studied this question.