Efficient hardware-supported packet scheduling is a cornerstone of fine-grained packet prioritization in modern networking systems. Algorithms such as Weighted Fair Queuing and Shortest Job First offer theoretical guarantees for achieving flow fairness and minimal flow completion times; however, their underlying priority-queue operation is difficult to implement at line rate due to its complexity. In this paper, we propose Dancing-Q, a line-rate dynamic approximation of a priority queue that is supported by current hardware and provides optimality guarantees. Essentially, Dancing-Q mimics the packet-sorting operation of a priority queue by distributing packets according to their ranks across multiple dynamically assigned FIFO queues. Key to the optimality guarantee are dynamic queue bounds that map ranges of packet ranks to the FIFO queues which are served using a strict priority scheduler. To minimize rank inversions, i.e., out-of-order scheduling when compared with an ideal priority queue, we introduce a per-queue local optimization strategy that rapidly drives queue bounds to optimal values based on the empirically observed packet ranks. Owing to the per-queue locality and a low memory and computation overhead, this optimization strategy allows prototyping Dancing-Q on programmable switches and SmartNICs. We demonstrate the priority queue emulation of Dancing-Q on a Tofino-based ASIC and use large-scale ns-3 experiments to show that compared to standard methods such as SP-PIFO, Dancing-Q (i) attains the optimal rank inversions, (ii) mimics the queuing delays of a priority queue on a significantly larger rank range, and (iii) curbs large flow starvation. Empirical evaluations show that Dancing-Q minimizes the flow completion time within a 4.4% overhead of the ideal priority queue.
Zhou et al. (Mon,) studied this question.