Aggregate queries often require computing large intermediate joins despite producing only small outputs. We identify broad classes of acyclic aggregate queries that can be evaluated without materialising any join results, using a bottom-up, semi-join–based propagation of cardinalities and partial aggregates. An implementation in Spark SQL shows that this approach is widely applicable and yields substantial performance gains on standard benchmarks.
Lanzinger et al. (Thu,) studied this question.