What does this research mean for the field?

Layer-wise pruning of neural networks provides a superior trade-off compared to global pruning, consistently reducing inference time, energy consumption, and memory usage while maintaining high accuracy. Novelty: ClaimNovelty.INCREMENTAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The research aims to analyze how different pruning strategies impact the efficiency of deep neural networks during inference and training.

March 21, 2026Open Access

Inference and training efficiency in pruned multilayer perceptron networks

Key Points

The research aims to analyze how different pruning strategies impact the efficiency of deep neural networks during inference and training.
Compared global pruning and layer-wise pruning approaches.
Tested on two multilayer perceptron models and a VGG-16 model.
Evaluated across five datasets: MNIST, FashionMNIST, EMNIST, CIFAR-10, and OctMNIST.
Considered different sparsity levels of 50% and 80%.
Measured inference time, energy usage, accuracy, and training energy consumption.
Layer-wise pruning reduced inference energy usage and time while maintaining accuracy.
For the small-scale model on MNIST at 50% sparsity, inference energy usage and time decreased by 33%.
Only a negligible 0.49% decrease in accuracy was observed with layer-wise pruning.
Layer-wise pruning preferred over global pruning for better energy efficiency.
Investigated CO2 emissions estimations and peak memory usage.

Abstract

This study explores how pruning strategies can improve the efficiency of deep neural networks (DNNs), which are widely used for tasks like image processing, medical diagnosis, etc. Although DNNs are powerful, they often contain weaker connections that can lead to increased energy consumption both during training and inference. To address this, we compare two pruning approaches: global pruning, which applies to all layers of the network, and layer-wise pruning, which focuses on the hidden layers. These approaches are tested across two MLP models, small-scale and medium-scale, and are then extended to a VGG-16 model as a representative example of Convolutional Neural Networks (CNNs). We evaluate the impact of pruning on five datasets (MNIST, FashionMNIST, EMNIST, CIFAR-10, and OctMNIST), and considering different sparsity levels (50% and 80%). Our results show that, in comparison to the benchmark dense networks (0% sparsity), layer-wise pruning offers the best trade-offs, by consistently reducing inference time and inference energy usage while maintaining accuracy. For example, training the small-scale model with the MNIST dataset and 50% sparsity led to a 33% reduction in inference energy usage, 33% in inference time, and only a negligible 0.49% decrease in accuracy. Furthermore, we investigate training energy consumption, CO2 emissions estimations, and peak memory usage, which again leads to choosing the layer-wise approach over global pruning. Overall, our findings suggest that layer-wise pruning is a practical approach for designing energy-efficient neural networks, particularly in achieving efficient trade-offs between performance and energy consumption.

Inference and training efficiency in pruned multilayer perceptron networks

Key Points

Abstract

Cite This Study