June 3, 2024Open Access

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We study the problem of gradient descent learning of a single-index target function f_* (x) = _* (, ) under isotropic Gaussian data in Rᵈ, where the link function _*: R is an unknown degree q polynomial with information exponent p (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with n d^ (p) samples, and such statistical complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm learns f_* of arbitrary polynomial link function with a sample and runtime complexity of n T C (q) dpolylog d, where constant C (q) only depends on the degree of _*, regardless of information exponent; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Lee et al. (Mon,) studied this question.

synapsesocial.com/papers/68e66845b6db6435875f466c https://doi.org/https://doi.org/10.48550/arxiv.2406.01581

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo