Deep learning is usually described through architectures, benchmarks, datasets, and scale: convolutional networks, Transformers, ImageNet, BERT, GPT, larger models, larger corpora, and larger compute. This paper argues that such descriptions miss a deeper organizing principle. The major transitions in deep learning are transitions in what models learn to generalize. This paper proposes a three-level framework of generalization in deep learning. Level 1 is supervised cross-sample generalization: models learn task-specific mappings from labeled examples and apply them to unseen samples. Level 2 is self-supervised representation generalization: models learn reusable internal representations from data-derived supervision and transfer them across downstream tasks. Level 3 is autoregressive and predictive high-dimensional relational generalization: models learn predictive structure over broad symbolic or multimodal streams, and the predictive process itself becomes a capability interface. The distinction between BERT and GPT is therefore not merely architectural. BERT makes representations reusable; GPT makes prediction interactive. This difference marks a transition from representation transfer to predictive interaction. The proposed framework explains deep learning progress as an expansion of the object of generalization: from task mappings, to reusable representations, to predictive relational structures. It also suggests that future advances will depend not only on scale, but on discovering richer generalization objects and more powerful interfaces through which learned structure can be used.
Zi Wang (Fri,) studied this question.