ABSTRACT This paper presents PPQSort (Pattern‐defeating Parallel Quicksort), a new parallel quicksort algorithm that provides high performance and ease of use. PPQSort uses C++ threads for parallelization, achieving efficient sorting without external libraries and allowing seamless integration across different computing environments. This paper describes novel quicksort optimizations, including branchless partitioning and their scalable parallel implementation. PPQSort is compared with existing parallel quicksort algorithms on two different CPU architectures (AMD EPYC, ARM A64FX) and with 7 different synthetic input data distributions. An analysis of its cache behavior is also provided. The results of the experimental evaluation demonstrate that PPQSort is fast and robust, consistently outperforming the fastest available parallel quicksort‐based implementations for all tested inputs.
Hévr et al. (Tue,) studied this question.