Los puntos clave no están disponibles para este artículo en este momento.
We study the problem of gradient descent learning of a single-index target function f_* (x) = _* (, ) under isotropic Gaussian data in Rᵈ, where the link function _*: R is an unknown degree q polynomial with information exponent p (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with n d^ (p) samples, and such statistical complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm learns f_* of arbitrary polynomial link function with a sample and runtime complexity of n T C (q) dpolylog d, where constant C (q) only depends on the degree of _*, regardless of information exponent; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.
Lee et al. (Mon,) studied this question.