What question did this study set out to answer?

This work aims to reduce the computational cost of knowledge-graph question answering by utilizing a self-distilling approach in a neuro-symbolic cascade.

June 14, 2026Open Access

Cost-Amortised Knowledge-Graph Question Answering via Self-Distilling Neuro-Symbolic Cascades

Key Points

This work aims to reduce the computational cost of knowledge-graph question answering by utilizing a self-distilling approach in a neuro-symbolic cascade.
Developed TACET, a three-tiered system with Datalog, link prediction, and a language model.
Implemented an online distillation loop to adaptively migrate workload from an expensive language model to cheaper components.
Tested on a controlled KGQA benchmark and real data using MetaQA with a Grok 4.3 teacher over multiple seeds.
Achieved 98.1% accuracy while reducing blended costs by approximately 3.9x compared to an LLM-only system.
Amortised costs by 2.8x for 1-hop and 5.1x for 2-hop queries with a real teacher in a controlled setup.
Generalizing composition rules enabled significantly fewer teacher calls with high-quality responses from the oracle teacher.

Abstract

Large language models make knowledge-graph question answering (KGQA) fluent, but every query pays the full price of an LLM call. We present TACET (Latin: "it is silent"), a self-distilling neuro-symbolic cascade that amortises that cost by progressively migrating a streamed workload off the LLM and onto cheap, checkable tiers — so the expensive teacher progressively falls silent. The cascade has three tiers: a sound forward-chaining Datalog rule engine (Tier 1, which abstains when it cannot prove an answer), a confidence-calibrated ComplEx link predictor verified against a typed ontology (Tier 2), and an LLM teacher (Tier 3). The central mechanism is an online distillation loop: whenever the teacher answers, TACET mines its answers into Datalog-checkable Horn rules and writes the facts back, so the routing distribution drifts toward the cheap tiers as the workload streams. Crucially, a synthesised rule generalises to entities the teacher never saw, which an answer cache cannot do. On a controlled KGQA benchmark (8 seeds), the cascade answers at 98.1% accuracy while reducing blended cost by ~3.9x relative to an LLM-only system under a simulated per-tier cost model. We then confirm this on real data with a real teacher: streaming MetaQA (a 43k-entity movie KG) through a real Grok 4.3 teacher over 3 seeds in a controlled design (Tier-2 disabled and a single teacher answer shared across arms, so accuracy is matched by construction), TACET amortises measured LLM dollars by 2.8x (1-hop) and 5.1x (2-hop) in pooled cost — the per-seed cost ratio spans 1.5–6.3x on 1-hop because only non-empty teacher answers are cached. This saving is delivered by answer reuse (the cascade's caching tier); under the real teacher, rule distillation adds no dollar advantage — full distillation ties the cache on every seed. The distillation-over-caching effect is itself teacher-quality-gated: with an oracle teacher the miner recovers a generalising composition rule and makes 87% fewer teacher calls than a cache (it answers unseen heads, a cache cannot), but under the noisy real LLM the miner recovers no installable rule and the cascade reduces to a cache. Tier-1 answers carry replayable Datalog proof trees, and we prove an ontology-preservation guarantee for the synthesised rules. We release the implementation, the benchmark generator, and the full experiment grid at the linked repository.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper