What question did this study set out to answer?

The central aim is to develop a decoupled model for optimal tile size selection that accounts for both compile-time and runtime information.

April 15, 2026Open Access

A Decoupled Analytical Model for Tile Size Selection in Affine Programs

Puntos clave

The central aim is to develop a decoupled model for optimal tile size selection that accounts for both compile-time and runtime information.
Introduced TileMind, a decoupled analytical model for tile size selection.
Implemented transformation-aware pre-tiling to extract compile-time metadata.
Combined extracted metadata with runtime profiling for optimized tile selection.
Transformed the selection objective into a binary product linearization problem.
Conducted intra-tile optimization to align computation with data layout.
Achieved 1.49× and 1.33× speedups on twenty PolyBench kernels for sequential and parallel processing, respectively.
Demonstrated speedups of 2.08–3.54× on three deep learning workloads compared to Pluto-tss.
TileMind outperformed TVM's MetaSchedule by 1.35–1.46× while reducing tuning overhead significantly.

Resumen

Existing tile size selection approaches are tightly coupled with compiler transformation pipelines, often leading to inaccurate modeling of cache behavior and limited effectiveness for non-rectangular tile shapes. This paper presents TileMind , a decoupled analytical model that combines compile-time and runtime information for tile size selection in affine programs. It introduces a transformation-aware pre-tiling step that enables the decoupled selector to remain consistent with compiler transformations while extracting compile-time metadata. The extracted metadata is then combined with profiled runtime characteristics to construct a richer yet tractable feasible domain, within which a nonlinear objective for tile size selection is formulated. This objective is subsequently transformed into a binary product linearization problem, with its nonlinear constraints also linearized for efficient optimization. Finally, an intra-tile optimization aligns computation with data layout to enhance data reuse within tiles. Across two multi-core Intel CPUs, TileMind achieves 1.49 × (sequential) and 1.33 × (parallel) mean speedups on twenty PolyBench kernels, and 2.08–3.54 × speedups on three deep learning workloads over the state-of-the-art analytical model Pluto-tss . Compared with TVM’s latest autotuner MetaSchedule, TileMind delivers 1.35–1.46 × mean speedups while reducing tuning overhead by 2–4 orders of magnitude. While demonstrating effectiveness on selecting tile sizes for non-rectangular tile shapes and compatibility with PPCG, Pluto, and TVM, we further provide proof-of-concept results on GPUs, illustrating the potential portability of TileMind across architectures.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo