October 17, 2021Open Access

TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory

Key Points

Key points are not available for this paper at this time.

Abstract

Personalized recommendation systems are gaining significant traction due to their industrial importance. An important building block of recommendation systems consists of the embedding layers, which exhibit a highly memory-intensive characteristic. A fundamental primitive of embedding layers is the embedding vector gathers followed by vector reductions, exhibiting low arithmetic intensity and becoming bottlenecked by the memory throughput. To tackle such a challenge, recent proposals employ a near-data processing (NDP) solution at the DRAM rank-level, achieving impressive performance speedups. We observe that prior rank-level-parallelism-based NDP solutions leave significant performance potential on the table as they do not fully reap the abundant transfer throughput inherent in DRAM datapaths.

Bookmark

View Full Paper