What type of study is this?

This is a Experimental Study study.

October 3, 2025Open Access

CaGR-RAG: Context-aware Query Grouping for Disk-based Vector Search in RAG Systems

Key Points

CaGR-RAG reduces 99th percentile tail latency by up to 51.55%, improving retrieval performance across systems.
The mechanism organizes queries based on shared cluster access patterns, enhancing cache efficiency and reducing I/O.
Experimental results consistently show higher cache hit ratios compared to baseline search performance metrics.
By minimizing cache misses with opportunistic cluster prefetching, CaGR-RAG effectively addresses latency issues.

Abstract

Modern embedding models capture both semantic and syntactic structures of queries, often mapping different queries to similar regions in vector space. This results in non-uniform cluster access patterns in disk-based vector search systems, particularly in Retrieval Augmented Generation (RAG) framework. While existing approaches optimize individual queries, they overlook the impact of cluster access patterns, failing to account for the locality effects of queries that access similar clusters. This oversight reduces cache efficiency and increases search latency due to excessive disk I/O. To address this, we introduce CaGR-RAG, a context-aware query grouping mechanism that organizes queries based on shared cluster access patterns. Additionally, it incorporates opportunistic cluster prefetching to minimize cache misses during transitions between query groups, further optimizing retrieval performance. Experimental results show that CaGR-RAG reduces 99th percentile tail latency by up to 51.55% while consistently maintaining a higher cache hit ratio than the baseline.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yeonwoo Jeong

Kyuli Park

Hyunji Cho

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

CaGR-RAG: Context-aware Query Grouping for Disk-based Vector Search in RAG Systems

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider