What question did this study set out to answer?

The central aim is to investigate how overparameterization modifies the loss landscape of one-hidden-layer ReLU networks with Lipschitz losses.

February 13, 2026Open Access

Asymptotic Smoothing of the Lipschitz Loss Landscape in Overparameterized One-Hidden-Layer ReLU Networks

Key Points

The central aim is to investigate how overparameterization modifies the loss landscape of one-hidden-layer ReLU networks with Lipschitz losses.
Analyzed convex L-Lipschitz losses with ℓ1-regularization on the output layer.
Proved connectivity between models at the same loss level with small excess loss.
Derived an asymptotic bound on energy barriers related to model width.
Conducted Dynamic String Sampling experiments on synthetic and Wisconsin Breast Cancer datasets.
Connectivity results were extended to broader loss functions including logistic and cross-entropy.
Energy barrier vanishes as network width increases, indicating enhanced connectivity.
Empirical experiments showed smaller barriers for wider networks, validated by a permutation test revealing p=0.

Abstract

This work studies how overparameterization reshapes the loss landscape of one-hidden-layer ReLU networks. On the theory side, it proves that for convex \ (L\) -Lipschitz losses with \ (₁\) -regularization on the output layer, any two models at the same loss level can be connected by a continuous path with arbitrarily small excess loss \ (\), extending earlier quadratic-loss connectivity results to a broader class of objectives (including logistic/cross-entropy settings). It also derives an asymptotic bound on the energy barrier, \ (= O (m^-) \), showing that the barrier vanishes as width \ (m\) increases, so sublevel sets become connected in the infinite-width limit. Empirically, Dynamic String Sampling experiments on synthetic Moons data and the Wisconsin Breast Cancer dataset show smaller pairwise barriers for wider networks; a permutation test on the maximum gap gives \ (p₏₄ₑ₌=0\), indicating a clear reduction in worst-case barrier height with width.

AI에게 질문

Bookmark

View Full Paper