Large Language Model applications using Retrieval-Augmented Generation (RAG) pipelines waste significant computational resources by injecting entire retrieved document chunks into context windows, despite relevant content occupying only a small fraction of each chunk. I present Vibe Index, a sub-microsecond exact phrase matching engine using Roaring Bitmaps and anchor-and-offset positional scanning. My approach achieves 112 nanosecond latency for exact phrase matching on single-match queries and 1.83 milliseconds for indexing 50K tokens — up to four orders of magnitude faster than embedding-based semantic search. The system provides exact token positions (not document-level matches) using approximately 0.5 MB of memory for 50K tokens, enabling injection of only ±50 tokens around matches rather than 1K–4K token chunks, reducing token consumption by 93% and saving approximately 5.65 MB of KV cache VRAM per query (cumulative savings of ~5.5 GB across 1000 queries) for 7B parameter models. I evaluate the system on real Rust source code and demonstrate the algorithmic foundation for hybrid retrieval combining BM25 document-level candidate selection with exact positional validation.
Mladen Popović (Mon,) studied this question.