Key points are not available for this paper at this time.
Abstract This study explores zero initialization in artificial neural networks, mimicking synaptic resetting during sleep. Despite the common belief that zero initialization hinders learning by causing identical outputs, our approach diversifies outputs by initializing weights to zero and biases to random values. We evaluated models on Modified National Institute of Standards and Technology (MNIST), Canadian Institute for Advanced Research (CIFAR)-10, and CIFAR-100 datasets using multilayer perceptrons (MLPs), convolutional neural networks (CNNs), residual networks (ResNets), vision transformers (ViTs), and multilayer perceptron mixers (MLP-Mixers). Results showed mixed outcomes: while zero initialization can hinder learning in some cases, it can also match or surpass random initialization in others, especially in plain neural network configurations. Among contemporary deep learning models, MLP-Mixers with zero initialization matched the performance of fully randomly initialized counterparts, despite half of the learnable parameters being set to zero. This study challenges the conventional view that zero initialization inherently degrades neural network performance.
Seo et al. (Thu,) studied this question.