Key points are not available for this paper at this time.
Neural networks have many successful applications, while much less understanding has been gained. Towards bridging this gap, we study problem of learning a two-layer overparameterized ReLU neural network for-class classification via stochastic gradient descent (SGD) from random. In the overparameterized setting, when the data comes from of well-separated distributions, we prove that SGD learns a network a small generalization error, albeit the network has enough capacity to arbitrary labels. Furthermore, the analysis provides interesting insights several aspects of learning neural networks and can be verified based on studies on synthetic data and on the MNIST dataset.
Li et al. (Fri,) studied this question.