Decoder-only language models (LMs) store factual knowledge directly in their parameters, resulting in large model sizes, costly retraining when facts change, and limited controllability in knowledge-intensive information systems. These models frequently mix stored knowledge with user-provided context, which leads to hallucinations and reduces reliability. To address these limitations, we propose KIT (Knowledge-Injected Transformer), a modular encoder–decoder architecture that separates syntactic competence from factual knowledge representation. In KIT, the decoder is pre-trained on knowledge-agnostic narrative corpora to learn language structure, while the encoder is trained independently to compress structured facts into compact latent representations. During joint training, the decoder learns to decompress these representations and generate accurate, fact-grounded responses. The modular design provides three key benefits: (1) factual knowledge can be updated by retraining only the encoder, without modifying decoder weights; (2) strict domain boundaries can be enforced, the modular design provides a structural foundation for reducing knowledge source confusion and hallucinations, with its actual effectiveness awaiting future validation on standard hallucination benchmarks; and (3) interpretability is improved because each generated token can be traced back to encoder activations. A real-world experimental evaluation demonstrates that KIT achieves competitive answer accuracy while offering superior controllability and substantially lower update costs compared to decoder-only baselines. These results indicate that modular encoder–decoder architectures represent a promising and reliable alternative for explainable, adaptable, and domain-specific question answering in modern information systems.
Kirichenko et al. (Thu,) studied this question.