August 15, 2024Open Access

Improving Artificial Neural Network Performance with Zero Initialization

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract This study explores zero initialization in artificial neural networks, mimicking synaptic resetting during sleep. Despite the common belief that zero initialization hinders learning by causing identical outputs, our approach diversifies outputs by initializing weights to zero and biases to random values. We evaluated models on Modified National Institute of Standards and Technology (MNIST), Canadian Institute for Advanced Research (CIFAR)-10, and CIFAR-100 datasets using multilayer perceptrons (MLPs), convolutional neural networks (CNNs), residual networks (ResNets), vision transformers (ViTs), and multilayer perceptron mixers (MLP-Mixers). Results showed mixed outcomes: while zero initialization can hinder learning in some cases, it can also match or surpass random initialization in others, especially in plain neural network configurations. Among contemporary deep learning models, MLP-Mixers with zero initialization matched the performance of fully randomly initialized counterparts, despite half of the learnable parameters being set to zero. This study challenges the conventional view that zero initialization inherently degrades neural network performance.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Seo et al. (Thu,) studied this question.

synapsesocial.com/papers/68e5c1e9b6db643587559642 https://doi.org/https://doi.org/10.21203/rs.3.rs-4890533/v1

Bookmark

View Full Paper