What question did this study set out to answer?

June 19, 2026

A Non-local Convergence Analysis of Gradient Flow for Deep Linear Networks

Key Points

This research aims to explore the non-local convergence properties of deep linear networks with one-neuron layers under gradient flow.
Analyzed optimization of deep linear networks given at least one layer with a single neuron.
Studied convergence trajectories from arbitrary balanced starting points under quadratic loss.
Described convergence rates ranging from sublinear to linear.
Identified convergent points of trajectories leading to saddle points under gradient flow.
Provided explicit convergence rates for trajectories in stages, showing variability from sublinear to linear.
Presented the first non-local analysis of deep linear networks with balanced initialization under quadratic loss.

Abstract

In this paper we study the non-local convergence properties of deep linear networks with a one-neuron layer. Specifically, under the quadratic loss, we consider optimizing deep linear networks in which there is at least one layer with only one neuron. We describe the convergent point of trajectories with an arbitrary balanced starting point under gradient flow, including the paths which converge to one of the saddle points. We also show specific convergence rates of these trajectories by stages with the explicit rates varying from sublinear to linear. As far as we know, our results are the first to give an explicit non-local analysis of such deep linear neural networks with arbitrary balanced initialization under the quadratic loss, rather than the lazy training regime which has dominated the literature of neural networks.

Mark Helpful

Bookmark

Relay