What question did this study set out to answer?

This work aims to create a hybrid retrieval system that integrates both semantic and relational data for better analytics accuracy.

February 19, 2026Open Access

Hybrid Semantic and Relational Retrieval for Trustworthy Enterprise Analytics

Key Points

This work aims to create a hybrid retrieval system that integrates both semantic and relational data for better analytics accuracy.
Introduced a hybrid vector-relational integration pattern.
Developed joinable evidence packs linking text passages to structured entities.
Formulated a hybrid scoring system combining dense similarity with lexical matching.
Created an orchestration algorithm to manage policy adherence and performance limits.
Established a reproducible evaluation framework to assess various retrieval modes.
Improved evidence coverage while maintaining constraint accuracy.
Reduced hallucination risks in enterprise analytic systems.
Demonstrated effective trade-offs regarding quality, latency, and compute requirements.

Abstract

Retrieval-Augmented Generation (RAG) enhances large language model (LLM) outputs by grounding responses in external evidence. In enterprise analytics environments, relevant evidence spans both unstructured sources, such as policies and incident reports, and structured systems, including customer records and transactional data. Purely vector-based retrieval provides strong semantic recall but cannot reliably enforce relational constraints such as entity scope or temporal validity. Conversely, SQL-centric retrieval guarantees predicate correctness but lacks robustness to paraphrased natural-language queries. This paper introduces a lightweight hybrid vector–relational integration pattern that unifies a relational data mart with a semantic index through joinable evidence packs: top-k text passages linked to structured entities and filtered using governed predicates. We formalize the system model, define a hybrid scoring formulation that combines dense similarity, sparse lexical matching, and constraint validity, and present an orchestration algorithm that enforces policy tags and prompt-budget limits. A reproducible evaluation framework demonstrates quality–latency and compute–freshness trade-offs across dense-only, SQL-only, and hybrid retrieval modes. By preserving traceability to both semantic excerpts and structured records within a unified governance and observability loop, the proposed approach improves evidence coverage while maintaining constraint correctness and auditability, thereby reducing hallucination risk in enterprise RAG analytics systems.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper