What question did this study set out to answer?

The aim is to improve knowledge retrieval by creating a dual-layer SPO architecture that mimics human reasoning processes.

March 29, 2026Open Access

Dual-Layer SPO Architecture for Embedding-Based Index Ranking

Key Points

The aim is to improve knowledge retrieval by creating a dual-layer SPO architecture that mimics human reasoning processes.
Extended the ontological model to include 8 roles, decomposing the HOW role into three distinct sub-roles.
Extracted SPO graphs from various source types, assigning each triple an ontological role using a priority-ordered classifier.
Developed ranking formulas combining embedding similarity and authority tiers for improved retrieval scoring.
Created a controlled semantic shift mechanism to reposition meanings in the index without retraining.
Achieved a 99.95% accuracy in the base system, ensuring high reliability.
Established a new framework where triples are treated as facts rather than hypotheses, enhancing interpretative capabilities.
Enabled multi-corpus scoring without the need for retraining the embedding model.

Abstract

This document describes an extension to the deterministic navigation architecture presented in: Chudinov, Y. (2026). Skill Without Training: Deterministic Knowledge Navigation for Large Language Models over Structured Documents. DOI: 10. 5281/zenodo. 18944351 The base system uses a 5-role ontological model (WHAT / WHY / HOW / WHEN / WHERE) for coarse-grained index routing. This work extends it with a dual-layer SPO (Subject–Predicate–Object) architecture that enables fine-grained embedding-based ranking over structured normative documents. The long-term goal is to build a progressive SPO index model equivalent to the human reasoning apparatus — a system where a question triggers the same deductive process a domain expert performs: identify what is known, rank by authority, discard redundancy, and assemble a traceable answer. The index encodes the expert's knowledge; chain reasoning replicates the expert's method. A further objective is to provide tools for controlled semantic shift — mechanisms that allow deliberate, traceable repositioning of meaning within the index without retraining, so that evolving domain knowledge is absorbed through structural reclassification rather than stochastic weight update. Contributions Extended ontological model (5 → 8 roles). The HOW role is decomposed into three semantically distinct sub-roles — structural composition (HOW), normative prescription (DO-prescribe), and observable behavior (DO-act / DO-react) — while the ontological layer (WHAT, WHY, WHEN) remains intact. This resolves the overloading problem where a single role conflated structure, obligation, and behavior. SPO graph extraction pipeline. A typed Subject–Predicate–Object graph is extracted from specification text using predicate classification across 6 source types (canonical definitions, normative matrices, design rationale, full-text scan). Each triple is assigned one of 8 ontological roles via a strict priority-ordered classifier. Tier assignment by source proximity. SPO chains are ranked by proximity to canonical definitions (T1: primary normative source, T2: direct association, T3: cross-file mention), providing an authority signal independent of embedding similarity. Cross-layer promotion mechanism. Strong physical-layer coverage (low-tier HOW/WHERE/DO chains) promotes associated ontological-layer chains (WHAT/WHY/WHEN), operationalizing the principle that physical specification implies semantic relevance. Ranking formula. Retrieval scoring is reduced to embeddingₛimilarity × tierweight, combining neural relevance with structural authority in a single pass. Paradigm shift from induction to deduction. The architecture redefines the role of standard mathematical components (embedding similarity, triple representation, authority propagation). Triples are not hypotheses to be verified (as in TransE/RotatE link prediction) but facts to be interpreted — the SPO index determines the ontological role and authority tier of each existing text fragment. Softmax-derived similarity serves as a discovery mechanism for the deductive engine, not as a token generation mechanism. The result: a trained neural network in inference mode becomes an expert system without domain-specific pre-training — the model contributes reasoning bandwidth, the index contributes domain facts. Tier deprecation as reasoning control. A systematic authority gradient (T1 → T2 → T3) controls chain reasoning drift — the further a fact from the canonical source, the less it contributes to the final assembly. Corpus-adaptive routing. The index set itself serves as the weight profile — each corpus receives a derived role-weight vector that shapes ranking without retraining the embedding model. Multi-corpus scoring becomes a weighted combination of per-corpus relevance. Index automation pipeline. SPO-driven generation of aspect indices from new document corpora, reducing manual index construction from weeks to hours while preserving tier assignment integrity. Compound role invariant. A formal invariant for information rerouting when constructs serve multiple ontological roles simultaneously — resolving the dominant-role problem through NDP (Non-Dominant Promotion) and strict/full pattern thresholds. Controlled semantic shift. Mechanisms for deliberate, traceable repositioning of meaning within the index without retraining — evolving domain knowledge is absorbed through structural reclassification rather than stochastic weight update. Relationship to Prior Work The base navigation system (DOI: 10. 5281/zenodo. 18944351) achieves 99. 95% accuracy on 246 specification queries using 14 pre-compiled indices and deterministic chain resolution. The SPO extension described here adds a trainable embedding layer on top of the deterministic foundation — preserving the auditability and reproducibility of the base system while enabling similarity-based ranking for queries that fall outside the pre-compiled index vocabulary. This architecture builds on established mathematical foundations (RAG retrieval, knowledge graph embeddings, authority-based ranking) but assigns each component a new function within a deductive frame. Retrieval replaces generation (not augments it) ; triples index existing relationships (not predict missing ones) ; authority drives deterministic assembly (not probabilistic ranking). This is an active working document: since 2025-12-16, it serves as the design basis for the index servicing system under development. Access This deposit is restricted. The document is available for review upon request. Contact the author for access. Patent Status Patent pending. This architecture is covered by provisional patent applications filed prior to this deposit.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper