Data Flow Architectures for Data Processing on Modern Hardware

Key Points

Key points are not available for this paper at this time.

Abstract

The requirements arising from ever growing amounts of data and tight performance constraints as well as the limitations encountered in improving conventional CPU performance have led to a proliferation of specialized architectures involving a wide variety of processor types (GPU, TPU, DPU, etc.) with processing becoming distributed across all points of the computing fabric (smart storage, smart memory, smart NICs, programmable switches, etc.). Examples abound both in industry and academia of new architectural configurations and hardware accelerators improving different aspects of a system. These developments raise an important question that is still open but has not attracted sufficient attention: how to design data processing engines systems over such highly heterogeneous and distributed architectures. In this paper we argue that data management engines on modern hardware will necessarily be based on data flow designs where processing happens in a streaming and pipelined fashion across the entire architecture, a radical departure from existing engines. In the paper we argue why this will be the case, the advantages of such designs, and outline a research program to allow data processing engines take advantage of hardware developments.

Mark Helpful

Bookmark

Relay