In this paper we study the non-local convergence properties of deep linear networks with a one-neuron layer. Specifically, under the quadratic loss, we consider optimizing deep linear networks in which there is at least one layer with only one neuron. We describe the convergent point of trajectories with an arbitrary balanced starting point under gradient flow, including the paths which converge to one of the saddle points. We also show specific convergence rates of these trajectories by stages with the explicit rates varying from sublinear to linear. As far as we know, our results are the first to give an explicit non-local analysis of such deep linear neural networks with arbitrary balanced initialization under the quadratic loss, rather than the lazy training regime which has dominated the literature of neural networks.
Chen et al. (Thu,) studied this question.