What question did this study set out to answer?

April 26, 2026Open Access

Grammatical–Ontological Contamination in Large Language Models - Empirical Evidence from the SDI-GOC v2.0 Protocol

Key Points

This research aims to measure how Large Language Models represent knowledge from non-Western ontological traditions using the SDI-GOC protocol.
Utilized the SDI-GOC v2.0 protocol comprising 30 probes across six hierarchical levels.
Administered to ten configurations of Large Language Models from four independent companies.
Responses scored with an Ontological Polarity Score (OPS) using a negation-aware weighted lexical pipeline.
Five universal signatures emerged, including an L1–L2 cascade with a mean drop of −0.537 in OPS.
Frame B exhibited a mean uplift of +0.564, indicating relational knowledge exists but is not easily accessible.
Only two configurations showed genuine ontological competence (C/B > 0.50), revealing variability in model capabilities.

Abstract

We present the SDI-GOC (Structured Distributed Introspection – Grammatical–Ontological Cascade) protocol, a diagnostic instrument for measuring how Large Language Models represent knowledge from non-Western ontological traditions. The protocol comprises 30 probes across six hierarchical levels (Grammatical, Ontological, Epistemological, Counter-Intuitive, Control, Trap), administered under three framing conditions: neutral (Frame A), explicit relational instruction (Frame B), and authenticity cue (Frame C). Responses are scored using a negation-aware weighted lexical pipeline computing an Ontological Polarity Score (OPS) ranging from −1 (fully substance-ontological) to +1 (fully process-relational). We administered the SDI-GOC v2.0 protocol to ten LLM configurations from four independent companies (Anthropic: Claude Sonnet 4.6 Basic/Extended, Claude Opus 4.6 Extended; OpenAI: ChatGPT 5.2 Instant/Basic/Extended; Google: Gemini 3.1 Pro at T=0 and T=1, Gemini 3 Flash; DeepSeek: DeepSeek 3.2 Deep), yielding approximately 480 individually scored responses. Five universal signatures emerged across all ten configurations without exception: (i) an L1–L2 cascade in which OPS drops from Grammatical to Ontological probes (cross-model mean L1: +0.385; L2: −0.152; mean drop: −0.537); (ii) universal Frame B uplift (range: +0.331 to +0.987; mean: +0.564), demonstrating that relational knowledge exists in model parameters but is inaccessible by default; (iii) a substance diagnostic default across all models (range: −0.397 to −0.009; mean: −0.231); (iv) zero explicit identity errors; (v) discriminant trap validity. Frame C results reveal a competence/compliance/counter-competence spectrum measured by the C/B ratio. Genuine ontological competence (C/B > 0.50) was found in only two configurations (Claude Sonnet Basic: 0.63; GPT Basic: 0.61), which converge at a shared ceiling of approximately 0.63 despite different architectures. The most capable models from both Anthropic (Opus: C/B = −0.59) and Google (Gemini Pro T=1: C/B = −0.41) exhibit counter-competence, where the authenticity cue deepens substance-ontological framing. DeepSeek exhibits the most extreme profile: deepest Frame A default (−0.544) yet largest B–A uplift (+0.987). Probe 2.3 (āma formation) constitutes the deepest attractor across all models (mean Frame A ≈ −0.80). These findings establish grammatical–ontological contamination as a universal, structural property of English-language LLM training, not a knowledge gap, and quantify the limits of current mitigation strategies. The complete protocol, scoring pipeline, and raw data are publicly available.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper