What question did this study set out to answer?

This thesis aims to understand the semantic nature of representations in deep neural networks and their impact on decision-making.

May 24, 2026Open Access

Explaining representations in deep neural networks

Key Points

This thesis aims to understand the semantic nature of representations in deep neural networks and their impact on decision-making.
Systematic investigation of neuron relationships to uncover structural patterns.
Development of a method to label neural representations with human-understandable textual labels.
Creation of a unified framework to decompose model decisions into interpretable concept combinations.
Demonstrated enhanced model transparency through new frameworks and analyses.
Enabled targeted identification of biases and spurious correlations in model outputs.
Validated approaches improve interpretability and accountability in real-world applications.

Abstract

The success of Deep Neural Networks (DNNs) is often attributed to their ability to learn powerful representations, which capture increasingly abstract features from data to make predictions. However, the semantic nature of these representations—specifically, what concepts they encode and how those abstractions are used in model decision-making—remains generally unknown. This thesis addresses this gap through a systematic investigation of the interpretability of latent representations in Computer Vision models, proposing novel frameworks to analyze, label, and explain their learned abstractions. We first analyze relationships among neurons to uncover structural patterns and spurious correlations within learned representations. Next, we present a method that describes neural representations with human-understandable textual labels, enabling precise identification of concepts captured by individual neurons. Building on these insights, we develop a unified framework that decomposes model decisions into sparse, interpretable concept combinations, thereby revealing how models leverage specific features during inference. Through empirical validation, we demonstrate how these approaches enhance model transparency, enable targeted identification of biases, and provide actionable insights for mitigating spurious correlations. Our work bridges the gap between empirical performance and interpretability, offering tools to make the decision-making processes of DNNs transparent while fostering trust and accountability in real-world applications.

Mark Helpful

Bookmark

Relay

View Full Paper