Los puntos clave no están disponibles para este artículo en este momento.
In recent years, state-of-the-art methods in computer vision have utilized deep convolutional neural network architectures (CNNs), with some the most successful models employing hundreds or even thousands of layers. A of pathologies such as vanishing/exploding gradients make training such networks challenging. While residual connections and batch normalization enable training at these depths, it has remained unclear whether such architecture designs are truly necessary to train deep CNNs. In work, we demonstrate that it is possible to train vanilla CNNs with ten layers or more simply by using an appropriate initialization scheme. derive this initialization scheme theoretically by developing a mean field for signal propagation and by characterizing the conditions for isometry, the equilibration of singular values of the input-output matrix. These conditions require that the convolution operator be an transformation in the sense that it is norm-preserving. We present algorithm for generating such random initial orthogonal convolution kernels demonstrate empirically that they enable efficient training of extremely architectures.
Xiao et al. (Thu,) studied this question.