Los puntos clave no están disponibles para este artículo en este momento.
We show that the gradient descent algorithm provides an implicit effect in the learning of over-parameterized matrix models and one-hidden-layer neural networks with quadratic. Concretely, we show that given (dr^2) random linear of a rank r positive semidefinite matrix X^\, we can X^\ by parameterizing it by UU^\ with U\ \^d\ d and minimizing the squared loss, even if r \ d. We prove starting from a small initialization, gradient descent recoversX^\ in () iterations approximately. The results the conjecture of Gunasekar et al. '17 under the restricted isometry. The technique can be applied to analyzing neural networks with-hidden-layer quadratic activations with some technical modifications.
Li et al. (Tue,) studied this question.