Large language model (LLM) agents that assist software development require accurate identification of relevant source code repositories from enterprise catalogs containing hundreds or thousands of entries. Existing approaches rely on either keyword search or embedding-based similarity alone, each suffering from well-known failure modes: keyword search misses semantically equivalent terms while vector search conflates superficially similar but functionally distinct services. We present a five-layer scope resolution algorithm that combines tag-based exact matching, full-text search, capability indexing, vector nearest-neighbor retrieval, and identifier re-scoring into a unified scoring framework. The algorithm employs compound-term boosting (3x multiplier for multi-word matches), square-root length normalization, and a dual-mode entity extraction pipeline combining synchronous regex patterns with asynchronous LLM-based semantic extraction. Deployed internally at a Fortune 50 retailer across an enterprise catalog of 871 application repositories, the algorithm achieves a 94.6% scope resolution success rate across 879 pipeline executions during early adoption (as of June 2026), and resolves monorepo sub-applications that single-strategy approaches consistently miss.
PAPA RAO Avasarala (Wed,) studied this question.