What question did this study set out to answer?

This research aims to introduce and analyze a new family of activation functions, Steklov activations, in comparison to standard functions like GELU.

April 10, 2026Open Access

Steklov Activations: Piecewise-Polynomial Gates with Compact Support and Tunable Sparsity

Key Points

This research aims to introduce and analyze a new family of activation functions, Steklov activations, in comparison to standard functions like GELU.
Derived activation functions from Steklov kernels in approximation theory
Studied performance in image classification and language modeling tasks
Analyzed inactivity patterns, pruning techniques, and inference efficiency
Demonstrated that Steklov activations can precisely represent HardSwish and approximate GELU
Introduced a scale parameter for balancing smoothness, selectivity, and sparsity
Shown improved neuron inactivity and efficient inference in conducted tasks

Abstract

Steklov Activations presents a family of compact-support piecewise-polynomial activation functions derived from Steklov kernels in approximation theory. Unlike standard smooth activations such as GELU or SiLU, Steklov activations have finite support in their gating function: outside a controllable transition region, neurons are exactly inactive or fully linear. This gives the family a distinctive property not present in common dense activations: a tunable mechanism for exact neuron inactivity. The paper shows that the family includes HardSwish exactly and can closely approximate GELU, while introducing a scale parameter that controls the tradeoff between smoothness, selectivity, and sparsity. It studies these activations across image classification and language modeling, including GPT-2 and a small LLaMA-style decoder, and analyzes their behavior in terms of performance, inactivity patterns, pruning, and inference efficiency.

Steklov Activations: Piecewise-Polynomial Gates with Compact Support and Tunable Sparsity

Key Points

Abstract

Cite This Study