What question did this study set out to answer?

This research aims to address performance issues in GPU data structures when memory is oversubscribed through innovative design strategies.

April 12, 2026Open Access

Designing GPU Data Structures for Efficient Memory Oversubscription

Key Points

This research aims to address performance issues in GPU data structures when memory is oversubscribed through innovative design strategies.
Developed two-level hierarchical designs for hash table and skip list data structures.
Utilized unified virtual memory (UVM) model for efficient memory oversubscription.
Designed an outer-level container for efficient access and an inner container for optimized data operations.
Conducted experimental evaluations on diverse input operation sequences.
Achieved a geometric mean speedup of 2.06× for the hash table and 2.37× for the skip list compared to UVM baselines.
Demonstrated substantial performance improvements over optimized UVM implementations in metagenomics applications.
Effective handling of irregular access patterns through the proposed design.

Abstract

Efficient concurrent data structures are important building blocks for accelerating applications on GPUs. With the ever-increasing memory footprint of GPU workloads, data structures used by kernels can exceed global memory capacity. Using the unified virtual memory (UVM) model is a popular approach for kernels to oversubscribe GPU memory without the need for explicit memory management by a programmer. However, we show that data structures executing with UVM can suffer from performance degradation due to the high overheads associated with data migration and thrashing for irregular access patterns. In this paper, we propose two-level hierarchical designs for hash table and skip list data structures that aim to maximize access locality and handle use cases where the data structure oversubscribes GPU memory. The outer-level container enables efficient jumps to desired regions of the data structure, while the inner container allows operating on the data. The inner container is sized to facilitate efficient data transfers between the CPU and the GPU. Experimental results on a diverse set of input operation sequences show that our data structure designs substantially improve performance over optimized UVM baselines while supporting high degrees of GPU memory oversubscription. Importantly, our proposed design, when used to implement key-value stores in metagenomics classification and k-mer counting applications, achieves a geomean speedup of 2.06× for hash table and 2.37× for skip list over baseline UVM implementations.

Designing GPU Data Structures for Efficient Memory Oversubscription

Key Points

Abstract

Cite This Study

Also Consider

Also Consider