What question did this study set out to answer?

The thesis aims to evaluate the performance and scalability of the Dask framework in different computing environments.

March 18, 2026Open Access

Performance and Scalability Analysis of Dask Applications on Large Scale Systems

Key Points

The thesis aims to evaluate the performance and scalability of the Dask framework in different computing environments.
Conducted strong-scaling and weak-scaling experiments
Analyzed Dask’s adaptive autoscaling mechanism
Integrated DuckDB as an execution engine within Dask workers
Compared performance on local Kubernetes deployment and HPC cluster
Kubernetes deployment shows limited scalability due to resource contention
HPC system maintains high parallel efficiency with better performance on memory-bound workloads
Adaptive scaling underperforms in both environments, causing instability
Combined Dask with DuckDB shows performance improvements, often 3-4 times faster than Dask DataFrame

Abstract

This thesis evaluates the scalability and performance characteristics of the Dask framework for large-scale analytical workloads across two contrasting computing environments: a local Kubernetes-based deployment and a high-performance computing (HPC) cluster managed by SLURM. Although Dask is increasingly used as a flexible alternative to frameworks such as Apache Spark, its behavior under compute-intensive workloads remains poorly known. To address this gap, we conduct extensive strong-scaling and weak-scaling experiments using different-scale (TPC-H–like) datasets, analyze Dask’s adaptive autoscaling mechanism, and integrate DuckDB as an embedded execution engine within Dask workers.The results show that the local Kubernetes deployment achieves limited scalability due to shared-resource contention, with performance saturating after only a few workers. In contrast, the HPC system maintains high parallel efficiency across many nodes, particularly when running one worker per node. This demonstrates that distributed memory bandwidth and reduced spill-to-disk activity are essential for scaling shuffle-intensive queries. Weak scaling on the HPC system is more stable than on the local cluster, although absolute runtimes are higher because distributed execution introduces significant network and I/O overheads.Adaptive scaling performs poorly in both environments. On Kubernetes, the autoscaler behaves unpredictably, frequently terminating workers too aggressively and destabilizing long-running queries. On the HPC system, adaptive scaling consistently underperforms fixed-size clusters because the ramp-up phase forces memory-bound workloads to execute with insufficient resources.Finally, experiments combining Dask with DuckDB reveal substantial performance improvements. Distributed DuckDB consistently outperforms Dask DataFrame execution—often by a factor of three to four—due to more efficient local processing, reduced I/O per worker, and optimized query execution. Even single-worker DuckDB is faster than Dask-only execution, emphasizing the importance of leveraging specialized query engines within distributed data-processing frameworks.Overall, this thesis provides a detailed empirical analysis of Dask’s behavior under large-scale analytical workloads and offers practical recommendations for configuring Dask on HPC systems, understanding its scaling limits, and integrating complementary technologies such as DuckDB to improve performance.

Bookmark

View Full Paper

Bookmark

View Full Paper

Performance and Scalability Analysis of Dask Applications on Large Scale Systems

Key Points

Abstract

Cite This Study