Current autoregressive language models excel at syntax but struggle with factual grounding, leading to hallucination. This work presents ZeroID, a system proposing that language understanding be framed as latent state prediction rather than token generation. The architecture integrates three core elements: a Joint Embedding Predictive Architecture (JEPA) for text, BitNet 1.58 bit ternary quantization (-1, 0, +1) applied via Straight Through Estimators, and Elastic Weight Consolidation (EWC) to support continual learning across a 7-phase curriculum. Memory is augmented via a FAISS-backed distributed vector store with novelty gated admission. We report the results of a 512-dimensional proof of concept training run, demonstrating convergence of the Variance Invariance Covariance Regularization (VICReg) objective from 61.9 to 15.9. While the objective reduces, we observe a trivial solution collapse where the predictor functionally mimics the encoder (cosine similarity > 0.99), highlighting a critical capacity threshold requirement for ternary latent prediction. We hypothesize that scaling the architecture, combined with its lack of token level reconstruction, offers a structural approach to hallucination mitigation.
Sujith B (Sun,) studied this question.