What type of study is this?

This is a Quantitative Study study.

September 29, 2025Open Access

The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models

Key Points

Results show that interpretable categorical features emerge at distinct temporal and scale thresholds in large language models, challenging existing assumptions.
Spatial analysis uncovered unexpected reactivation of early-layer semantic features in later layers, indicating complex representational dynamics.
Mechanistic interpretability was achieved through the use of sparse autoencoders, allowing insight into the activation of semantic concepts.
Findings provide new understanding into the behavior of large language models at different training checkpoints and model scales.

Abstract

This paper studies the emergence of interpretable categorical features within large language models (LLMs), analyzing their behavior across training checkpoints (time), transformer layers (space), and varying model sizes (scale). Using sparse autoencoders for mechanistic interpretability, we identify when and where specific semantic concepts emerge within neural activations. Results indicate clear temporal and scale-specific thresholds for feature emergence across multiple domains. Notably, spatial analysis reveals unexpected semantic reactivation, with early-layer features re-emerging at later layers, challenging standard assumptions about representational dynamics in transformer models.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper