We describe a comprehensive compilation approach for relational algebra, centered on an abstract loop-based intermediate representation (IR) that can express fused implementations of relational algebra operators on both sets and multisets. The loops are abstracted away from physical data structures thus making it easier to optimize. We show how to lower relational algebra query plan trees to the IR, including the complex operators that are used in production database systems, such as outer joins, non-equi joins, and differences. We then show how to compile this IR to efficient C++ code that co-iterates over the physical data structures present in relational algebra expressions. Finally, we show that our approach is portable across data structures. Since our approach can fuse across disparate operators, it achieves a 3.8x speedup (0.8--12.1x) compared to Hyper on selected LSQB benchmarks and worst-case optimal triangle queries. Our compiler also generates code of high quality: it has similar sequential performance to Hyper on TPC-H with a 1.0x speedup (0.4--4.3x) and is approaching their parallel performance with a 0.6x speedup (0.2--1.8x).
Dong et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: