What question did this study set out to answer?

The central aim is to develop a framework for optimizing GPU kernel efficiency while addressing financial implications.

March 28, 2026Open Access

KernelIQ: A Cost-Aware Framework for GPU Kernel Efficiency Optimization and FinOps Attribution

Key Points

The central aim is to develop a framework for optimizing GPU kernel efficiency while addressing financial implications.
Introduced Kernel Efficiency Record (KER) for profiling inefficiencies.
Developed RECAM model for mapping performance counters to costs.
Categorized inefficiency patterns using HIT taxonomy.
Proposed CGPG protocol to optimize kernel patches.
Identified potential recoverable inefficiencies of up to $80,000/day in GPU clusters.
Demonstrated systematic profiling across memory behavior, compute utilization, and launch configuration.

Abstract

Modern GPU clusters incur significant inefficiencies due to suboptimal kernel execution patterns, leading to substantial financial waste. In this work, we introduce KernelIQ, a cost-aware analytical framework that redefines GPU inefficiency as a FinOps problem. KernelIQ formalizes inefficiency through the Kernel Efficiency Record (KER), enabling systematic profiling and attribution of inefficiencies across three primary dimensions: memory access behavior, compute utilization, and launch configuration. We further introduce RECAM, a cost attribution model that maps low-level performance counters to real-world financial impact, and HIT, a taxonomy of inefficiency patterns including Memory Non-Coalescing (MNC), Warp Occupancy Starvation (WOS), and Kernel Launch Fragmentation (KLF). To address these inefficiencies, we propose the CGPG protocol, which leverages large language models to generate, validate, and deploy optimized kernel patches under correctness and performance constraints. While empirical validation is deferred to future work, we demonstrate through analytical modeling that KernelIQ can identify up to 80, 000/day in recoverable inefficiencies in large-scale GPU clusters. This work positions GPU performance optimization as an economically grounded systems problem, opening new directions for cost-aware infrastructure intelligence.

KernelIQ: A Cost-Aware Framework for GPU Kernel Efficiency Optimization and FinOps Attribution

Key Points

Abstract

Cite This Study