Abstract This paper proposes a theoretical model of speech code formation (without empirical expansion), based on positions previously formulated by the author in the work “From Microgesture to Vocal: The Formation of a Speech Code”, and, in the present version, correlated with contemporary theories of gestural and polysemiotic origins of language. The paper presents a model of the mechanism underlying the historical transition from image-based and gestural modes of communication to a vocal form of meaning transmission. As a starting point, semantics is understood as an internal cognitive image that is formed prior to linguistic expression and precedes the act of communication. Gesture is regarded as a bodily instrument for externalizing an internal image into communication and for initiating its associative reconstruction in another participant. Within the proposed model, language is interpreted as a historically formed system of bodily and image-motivated means of semantic transmission. Special attention is given to the concept of the microgesture, understood as a rationalized and compressed bodily action that serves as a carrier of image-based semantics, alongside proto-gesture and classical gesture. The specificity of the microgesture lies in the fact that, at this historical stage, it becomes embedded in the human articulatory apparatus, as a result of which its articulation is accompanied by a vocal projection. It is shown that this process leads to the formation of the speech element (hereafter SE) as a vocal projection of bodily action—an “acoustic shadow” of the microgesture. Thus, a continuous model of image-based categorization is proposed, encompassing all stages of speech formation: from the internal image to gesture, microgesture, and the speech element. The paper correlates the proposed model with existing theories of gestural and polysemiotic origins of language, as well as with cognitive-semantic approaches in which image is recognized as the foundation of linguistic meaning. As a result, a theoretical framework is formulated that allows cognitive imagery, bodily communication, and vocal speech to be integrated into a unified evolutionary and semiotic model of speech code formation.
Sergei Gennadevich Zaitsev (Sat,) studied this question.