Key points are not available for this paper at this time.
Enterprise document retrieval systems face a critical challenge: how to deliver high-quality search results while maintaining strict privacy requirements such as regulatory compliance and complete data sovereignty. Existing solutions either rely on cloud-based APIs that violate privacy constraints or deploy large-scale models (278 M+ parameters) that are impractical for on-premise deployment. We introduce HR + QDA (Hybrid Retrieval with Query-Document Attention), a privacy-preserving retrieval system that enables organizations to deploy customizable document search while maintaining complete data control. HR + QDA combines BM25 sparse retrieval with a lightweight 1.77 M-parameter cross-attention reranker that can be trained on-premise in 13 min with as few as 1,500 labeled examples. Evaluated on NFCorpus nutrition domain (2,914 queries), HR + QDA achieves 91% of state-of-the-art performance (MRR 0.744 vs. 0.814) with 157× fewer parameters than BGE-reranker-base, 2.3× faster inference (29ms vs. 68ms), and crucially, the ability to customize on private domain-specific data without external transmission. Our results demonstrate that lightweight, privacy-preserving customization provides a practical path for enterprise retrieval deployment across domains including healthcare, finance, legal, and technical documentation that addresses the critical gap between cloud-based accuracy and on-premise privacy requirements.
Tua et al. (Wed,) studied this question.