June 14, 2018Open Access

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

In recent years, state-of-the-art methods in computer vision have utilized deep convolutional neural network architectures (CNNs), with some the most successful models employing hundreds or even thousands of layers. A of pathologies such as vanishing/exploding gradients make training such networks challenging. While residual connections and batch normalization enable training at these depths, it has remained unclear whether such architecture designs are truly necessary to train deep CNNs. In work, we demonstrate that it is possible to train vanilla CNNs with ten layers or more simply by using an appropriate initialization scheme. derive this initialization scheme theoretically by developing a mean field for signal propagation and by characterizing the conditions for isometry, the equilibration of singular values of the input-output matrix. These conditions require that the convolution operator be an transformation in the sense that it is norm-preserving. We present algorithm for generating such random initial orthogonal convolution kernels demonstrate empirically that they enable efficient training of extremely architectures.

Me gusta

Guardar

Ver artículo completo