Modern GPU clusters incur significant inefficiencies due to suboptimal kernel execution patterns, leading to substantial financial waste. In this work, we introduce KernelIQ, a cost-aware analytical framework that redefines GPU inefficiency as a FinOps problem. KernelIQ formalizes inefficiency through the Kernel Efficiency Record (KER), enabling systematic profiling and attribution of inefficiencies across three primary dimensions: memory access behavior, compute utilization, and launch configuration. We further introduce RECAM, a cost attribution model that maps low-level performance counters to real-world financial impact, and HIT, a taxonomy of inefficiency patterns including Memory Non-Coalescing (MNC), Warp Occupancy Starvation (WOS), and Kernel Launch Fragmentation (KLF). To address these inefficiencies, we propose the CGPG protocol, which leverages large language models to generate, validate, and deploy optimized kernel patches under correctness and performance constraints. While empirical validation is deferred to future work, we demonstrate through analytical modeling that KernelIQ can identify up to 80, 000/day in recoverable inefficiencies in large-scale GPU clusters. This work positions GPU performance optimization as an economically grounded systems problem, opening new directions for cost-aware infrastructure intelligence.
VEERPRATAP SINGH VEERPRATAP SINGH (Thu,) studied this question.