This work introduces a unified conceptual and mathematical framework for understanding deep learning architectures (MLPs, CNNs, RNNs, Transformers) as compositions of value attribution, contextual value transformation, and explicit function approximation. The paper clarifies structural distinctions between raw data processing, context construction, and explicit task-level computation, and proposes a perspective under which convolution and attention are interpreted as relational operators acting prior to a multilayer perceptron. Supplementary Materials This version includes animated figures (MP4 format) provided as supplementary materials.These animations visually illustrate the unified framework proposed in the paper, namely:(i) value attribution through embeddings,(ii) contextual value transformation via convolution and attention mechanisms,and (iii) explicit function approximation by a multilayer perceptron. The animated figures are intended to complement the static figures in the document and tofacilitate conceptual understanding of the theoretical framework.
Hamaoui et al. (Tue,) studied this question.