What question did this study set out to answer?

This research aims to establish a framework for designing bounded activations in neural networks to enhance stability and efficiency.

May 3, 2026Open Access

Designing Bounded Activations: Geometric Control of Neural Network Stability, Sparsity, Quantization, and Energy-Based Models

Read Full Paperexternally

Key Points

This research aims to establish a framework for designing bounded activations in neural networks to enhance stability and efficiency.
Developed the SoftCap family of bounded activations, including SoftCap, SwishCap, and SparseCap.
Implemented variance-preserving initialization and stress tests across various configurations.
Evaluated performance in transformers and energy-based models for stability and reduced peak scores.
SwishCap achieved 100% survivorship in 16 aggressive configurations, indicating superior stability.
SoftCap family maintained stability at learning-rate multipliers up to 80x, with perplexity gains over GELU.
SparseCap demonstrated 8.9% structural sparsity in activations while compressing outlier logit gaps by up to 85x.

Abstract

While unbounded activations like ReLU and SiLU drive modern architectures, their lack of geometric constraints necessitates compensatory architectures to mitigate failure modes in stability, quantization, and implicit modeling. We address this with a constraints-first design framework yielding the **SoftCap family**, a C⁰-C² progression of bounded rectifiers: **SoftCap** (exact-zero), **SwishCap** (C¹ derivative-matched), and **SparseCap** (C² quintic notch), ready as drop-in replacements. Derived in closed form and anchored by variance-preserving initialization, we replace benchmark-driven empirical search with principled forward constraints. In grokking stress tests, SwishCap achieves 100% survivorship across all 16 aggressive configurations, supporting the mechanism that origin-adjacent recovery geometry and tight forward scale jointly expand the safe operating region. In transformers, the SoftCap family remains stable at learning-rate multipliers of **up to 80** where standard controls collapse, while providing perplexity gains over standard GELU baselines and reducing peak attention scores by 3--4, decreasing reliance on explicit clamping; concurrently, bounded FFNs suppress post-activation outliers by 15 and reduce INT8 quantization-induced perplexity degradation by **over 25%**. Across heavy-tailed OOD shifts, the same bounded geometry compresses outlier logit gaps by up to 85. Structurally, SparseCap natively generates 8. 9% structural sparsity in NanoGPT query activations, establishing the mathematical foundation for sparse attention without post-hoc thresholding. Finally, in energy-based models (EBMs), bounded geometry provides an architectural alternative to objective-level landscape regularization: the SoftCap family suppresses score-tail excursions, reduces spurious drift, and raises the high-fidelity sampling ceiling. Together, these findings support geometric activation bounds as a shared mechanism for regulating failure modes across explicit and implicit architectures, offering a unified framework for robust model design.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Larry Cai

Australian Regenerative Medicine Institute

Jie Tang

Australian Regenerative Medicine Institute

Actions

Institutions

Monash University

Australian Regenerative Medicine Institute

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Designing Bounded Activations: Geometric Control of Neural Network Stability, Sparsity, Quantization, and Energy-Based Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study