Accurate product-to-catalog invoice matching is a foundational internal control for financial oversight and audit quality, yet it is bottlenecked by inconsistent vendor descriptions and the resulting ‘long tail’ of supplier heterogeneity, driving costly manual reconciliation in Enterprise Resource Planning (ERP) environments. This study pursues three objectives: (i) to design a Retrieval-Augmented Generation (RAG) architecture that matches invoice line items to a product catalog under conditions of optical character recognition noise, vendor-specific abbreviations, and multilingual heterogeneity; (ii) to evaluate this architecture on three public entity resolution benchmarks against established lexical and Dense retrieval baselines; and (iii) to assess its viability as a decision support system in a real accounts payable workflow with audit-trail requirements. To address (i), we introduce a novel ‘augment-both-sides’ strategy: large language models (LLMs) proactively enrich each catalog Stock Keeping Unit (SKU) with synonyms and alternative descriptions before vectorization, while invoice lines undergo runtime query expansion, and an LLM-based reranker produces the final Top-3 candidates. For (ii), evaluation on the Abt-Buy, Amazon-Google, and Walmart-Amazon datasets yields Top-3 Recall of 91.60% to 97.96%, matching or exceeding the strongest non-LLM baseline on every benchmark. For (iii), a production deployment on approximately 200 manually verified Greek invoice lines (proprietary dataset, anecdotal observation) yields a Top-3 hit rate of approximately 97%, consistent with the public-benchmark results. The architecture functions as a reliable intelligent decision aid, narrowing the search space from thousands of SKUs to a precise candidate set for structured human verification.
Dadopoulos et al. (Sun,) studied this question.