The Model Context Protocol (MCP) lets developers expose tools and data sources to LLM-based agents through a standardized interface. Despite rapid ecosystem growth, no methodology exists for evaluating whether a given MCP server improves agent task completion. We present mcpbr, an open-source benchmark runner that isolates the effect of MCP tool augmentation through paired comparison experiments. We evaluate a code graph analysis MCP server on all 500 tasks from SWE-bench Verified using Claude Sonnet as the base agent. MCP augmentation reduced resolution rate by 14.9% (from 49.8% to 42.4%) while improving efficiency: 42.3% fewer tool calls, 14.0% fewer tokens, and 15.2% lower cost. Per-repository analysis shows the effect varies across codebases, with the server helping on 1 of 12 repositories and hurting on 10. We analyze this efficiency-resolution tradeoff and show that MCP tools alter the agent's exploration strategy, trading general-purpose search for opinionated shortcuts that can narrow the solution space.
Building similarity graph...
Analyzing shared references across papers
Loading...
Grey Newell
Georgia Institute of Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Grey Newell (Thu,) studied this question.
www.synapsesocial.com/papers/6992b4ad9b75e639e9b09add — DOI: https://doi.org/10.5281/zenodo.18627369