Existing tile size selection approaches are tightly coupled with compiler transformation pipelines, often leading to inaccurate modeling of cache behavior and limited effectiveness for non-rectangular tile shapes. This paper presents TileMind , a decoupled analytical model that combines compile-time and runtime information for tile size selection in affine programs. It introduces a transformation-aware pre-tiling step that enables the decoupled selector to remain consistent with compiler transformations while extracting compile-time metadata. The extracted metadata is then combined with profiled runtime characteristics to construct a richer yet tractable feasible domain, within which a nonlinear objective for tile size selection is formulated. This objective is subsequently transformed into a binary product linearization problem, with its nonlinear constraints also linearized for efficient optimization. Finally, an intra-tile optimization aligns computation with data layout to enhance data reuse within tiles. Across two multi-core Intel CPUs, TileMind achieves 1.49 × (sequential) and 1.33 × (parallel) mean speedups on twenty PolyBench kernels, and 2.08–3.54 × speedups on three deep learning workloads over the state-of-the-art analytical model Pluto-tss . Compared with TVM’s latest autotuner MetaSchedule, TileMind delivers 1.35–1.46 × mean speedups while reducing tuning overhead by 2–4 orders of magnitude. While demonstrating effectiveness on selecting tile sizes for non-rectangular tile shapes and compatibility with PPCG, Pluto, and TVM, we further provide proof-of-concept results on GPUs, illustrating the potential portability of TileMind across architectures.
Yuan et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: