Steklov Activations presents a family of compact-support piecewise-polynomial activation functions derived from Steklov kernels in approximation theory. Unlike standard smooth activations such as GELU or SiLU, Steklov activations have finite support in their gating function: outside a controllable transition region, neurons are exactly inactive or fully linear. This gives the family a distinctive property not present in common dense activations: a tunable mechanism for exact neuron inactivity. The paper shows that the family includes HardSwish exactly and can closely approximate GELU, while introducing a scale parameter that controls the tradeoff between smoothness, selectivity, and sparsity. It studies these activations across image classification and language modeling, including GPT-2 and a small LLaMA-style decoder, and analyzes their behavior in terms of performance, inactivity patterns, pruning, and inference efficiency.
Aleksandr Masalskikh (Thu,) studied this question.