We present a garbage collection (GC) algorithm for multi-version concurrency control (MVCC) in database systems that support hierarchical copy-on-write branching. Traditional MVCC GC systems use single-dimensional retention policies (time-based or snapshot-based) and are unaware of branch hierarchies. When a parent branch garbage-collects old versions, child branches that depend on those versions through inheritance lose data access, causing query failures. Our algorithm computes a multidimensional GC horizon for each branch by taking the minimum log sequence number (LSN) across three dimensions: (1) the branch’s own retention period, (2) the creation LSNs of all descendant branches (protecting inherited data), and (3) active transaction snapshots across the branch lineage. We further integrate this branch-aware GC policy with LSM-tree compaction by filtering version records during SSTable merge operations. The algorithm achieves 100% correctness (zero data loss in multi-branchscenarios) with 10–30% storage savings versus a conservative “never delete” approach. We describe the architecture, algorithms, and implementation in HeliosDB, with per-branch observability via 10 Prometheus metrics. No prior database system combines hierarchical branch visibility with MVCC garbage collection and LSM compaction integration.
Daniel Moya Vaca (Thu,) studied this question.