What question did this study set out to answer?

This research aims to understand how convolutional neural networks handle information during learning in image classification tasks.

January 21, 2026Open Access

Uncovering Neural Learning Dynamics Through Latent Mutual Information

Key Points

This research aims to understand how convolutional neural networks handle information during learning in image classification tasks.
Track mutual information between inputs, intermediate representations, and labels in various neural network architectures.
Analyze changes in label-relevant and input mutual information across different layers.
Implement inference-time knockouts, shuffles, and perturbations to evaluate functional necessity of high-MI channels.
Introduce a dependence-aware regularizer based on the Hilbert–Schmidt Independence Criterion.
Label-relevant mutual information increases with layer depth across different architectures.
Input mutual information varies significantly depending on architecture and activation types.
High-MI channels show functional necessity for accuracy in information processing.
The proposed regularizer leads to small accuracy improvements and faster convergence during training.

Abstract

We study how convolutional neural networks reorganize information during learning in natural image classification tasks by tracking mutual information (MI) between inputs, intermediate representations, and labels. Across VGG-16, ResNet-18, and ResNet-50, we find that label-relevant MI grows reliably with depth while input MI depends strongly on architecture and activation, indicating that “compression’’ is not a universal phenomenon. Within convolutional layers, label information becomes increasingly concentrated in a small subset of channels; inference-time knockouts, shuffles, and perturbations confirm that these high-MI channels are functionally necessary for accuracy. This behavior suggests a view of representation learning driven by selective concentration and decorrelation rather than global information reduction. Finally, we show that a simple dependence-aware regularizer based on the Hilbert–Schmidt Independence Criterion can encourage these same patterns during training, yielding small accuracy gains and consistently faster convergence.

Uncovering Neural Learning Dynamics Through Latent Mutual Information

Key Points

Abstract

Cite This Study

Also Consider

Also Consider