Abstract Motivation Community detection methods are applied to single cell RNA sequencing (i.e., scRNA-seq) and mass cytometry data to efficiently identify major cell types and their subtypes, but their computational demands increase, particularly given the substantial growth in dataset sizes. The Leiden algorithm, an emerging method in this field, offers inherent parallelism that remains underutilized due to the limited parallel processing capabilities offered by today’s modern multi-core CPUs, which have fewer than 100 cores (typically 32–64CPUs). However, Leiden can achieve significant performance gains when implemented on GPUs. GPUs offer high memory bandwidth and an extensive array of parallel processing units that map well to the parallelism in Leiden. As far as we know, cuGraph is the only implementation that has mapped the Leiden algorithm to GPUs, using a blend of Python and C languages. However, it only supports undirected graphs, potentially discarding the valuable information carried by edge directionality. In addition, this Python implementation for GPUs is comparatively slower than a C/C ++ based implementation, reducing the significant performance gains provided by a GPU-based speedup. Conversely, a C/C ++ based implementation optimizes performance more effectively, ensuring an accurate baseline comparison when performing GPU acceleration. Results We developed a tool named gLeiden, a lightweight CUDA C ++ based GPU implementation of the Leiden algorithm and, to the best of our knowledge, the very first GPU implementation that supports directed graphs, which generally demands nearly twice the computational time and memory resources compared to undirected graphs. The results show that our directed gLeiden outperforms the directed cLeiden version and shows 11x and 12x speedup on very large datasets. Our undirected ucLeiden and ugLeiden implementations significantly outperform the original Java version, with up to 42x speedup on large datasets. However, when comparing the undirected ugLeiden version with cuGraph, ugLeiden performance is comparable on smaller datasets and 58% faster on larger datasets. These results position our GPU-based Leiden implementation as a high-performance alternative to existing state-of-the-art community detection tools. Availability The source code and sample data are available at: https://github.com/Beenishgul/Leiden and https://figshare.com/s/3b51e463a56e2a374bdf
Gul et al. (Thu,) studied this question.