What question did this study set out to answer?

The central aim is to enhance in-memory vector database performance on NUMA-CXL hybrid memory platforms.

June 1, 2026Open Access

Optimizing In-Memory Vector Database Performance via Using CXL Memory

Key Points

The central aim is to enhance in-memory vector database performance on NUMA-CXL hybrid memory platforms.
Developed a bandwidth-compute coupled memory allocation strategy for vector retrieval.
Established a three-tier memory access cost model based on DRAM and CXL characteristics.
Implemented bandwidth-aware data partitioning and NUMA-aware thread scheduling alongside a CXL prefetch pipeline.
Achieved up to 12× performance improvement under high concurrency.
Bandwidth-weighted segmentation and thread affinity contributed 11× enhancement in efficiency.
CXL prefetching reduced remote access overhead by about 10%.

Abstract

This paper proposes a bandwidth–compute coupled memory allocation and management strategy for high-dimensional vector retrieval on NUMA–CXL hybrid memory platforms. Based on empirical characterization of local DRAM, remote DRAM, and CXL nodes, a three-tier memory access cost model is established to guide a two-level optimization framework. The proposed approach integrates (i) bandwidth-aware data partitioning that assigns contiguous vector segments to each memory node in proportion to its measured effective bandwidth, and (ii) NUMA-aware, compute-coupled thread scheduling that co-locates execution with the corresponding data segment. A double-buffered CXL prefetch pipeline further reduces the impact of the low-bandwidth and high-latency CXL path by staging upcoming blocks into DRAM. Experiments on the FAISS-Flat index using the SIFT1M and GloVe datasets demonstrate up to 12× performance improvement under high concurrency, with bandwidth-weighted segmentation and thread affinity contributing 11×, and CXL prefetching reducing remote access overhead by about 10%.

Bookmark

View Full Paper

Bookmark

View Full Paper

Optimizing In-Memory Vector Database Performance via Using CXL Memory

Key Points

Abstract

Cite This Study