Key points are not available for this paper at this time.
We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. Concretely, we show that given O (dr^2) random linear measurements of a rank r positive semidefinite matrix X^, we can recover X^ by parameterizing it by UU^ with U R^d d and minimizing the squared loss, even if r d. We prove that starting from a small initialization, gradient descent recovers X^ in O (r) iterations approximately. The results solve the conjecture of Gunasekar et al. '17 under the restricted isometry property. The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.
Li et al. (Tue,) studied this question.