What question did this study set out to answer?

The aim is to enhance information-based similarity measurement by addressing limitations of classical algorithmic information theory at the syntactic level.

May 21, 2026Open Access

Semantic Algorithmic Information Theory: From Kolmogorov Complexity to Semantic Equivalence

Puntos clave

The aim is to enhance information-based similarity measurement by addressing limitations of classical algorithmic information theory at the syntactic level.
Introduced Semantic Turing Machine System (STMS) to formalize abstract concepts decoupled from syntax
Developed a model-based estimator for Normalized Semantic Information Distance (NSID) using neural autoregressive models
Conducted experimental validation and comparative analysis to evaluate performance against classical metrics.
NSID suppresses syntactic variance while maintaining semantic structure.
Empirical evidence shows NSID exceeds classical metrics in measuring cross-representational equivalence.

Resumen

Classical Algorithmic Information Theory (AIT) provides a rigorous foundation for information-based similarity measurement, but classical formulations and their compression-based approximations largely operate at the syntactic level, making them sensitive to surface-level variation and insufficient for semantic equivalence. To address this limitation, this paper introduces Semantic Algorithmic Information Theory. The contributions are organized around three core aspects. First, regarding algorithmic extension, we formalize the Semantic Turing Machine System (STMS) to decouple abstract concepts from their diverse syntactic realizations. Within this framework, Semantic Complexity is defined as the minimum program length required to generate some realization in a synonymous set, thereby characterizing compact meaning representation. Second, to enable approximate computation, we move from the ideal, uncomputable semantic information distance to a model-based direct estimator of the Normalized Semantic Information Distance (NSID), which uses neural autoregressive models as conditional probability estimators. Finally, through experimental validation and comparative analysis, we show that the NSID estimator suppresses syntactic variance while preserving semantic structure. Empirical results indicate that NSID provides a practical, computable surrogate for semantic distance and improves upon classical syntactic metrics in evaluating cross-representational equivalence.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo