What question did this study set out to answer?

This work aims to enhance language understanding by reframing it as latent state prediction instead of token generation.

May 26, 2026Open Access

ZeroID: Latent Reasoning via Ternary Joint Embedding Prediction with Continual Learning and Distributed Memory

Key Points

This work aims to enhance language understanding by reframing it as latent state prediction instead of token generation.
Developed ZeroID framework integrating Joint Embedding Predictive Architecture (JEPA) and BitNet for ternary quantization.
Utilized Elastic Weight Consolidation (EWC) for continual learning across a 7-phase curriculum.
Implemented a FAISS-backed distributed vector store for memory augmentation with novelty gated admission.
Achieved convergence of the VICReg objective from 61.9 to 15.9 during training.
Observed a trivial solution collapse where the predictor closely mimicked the encoder (cosine similarity > 0.99).
Identified a critical capacity threshold for effective ternary latent prediction.

Abstract

Current autoregressive language models excel at syntax but struggle with factual grounding, leading to hallucination. This work presents ZeroID, a system proposing that language understanding be framed as latent state prediction rather than token generation. The architecture integrates three core elements: a Joint Embedding Predictive Architecture (JEPA) for text, BitNet 1.58 bit ternary quantization (-1, 0, +1) applied via Straight Through Estimators, and Elastic Weight Consolidation (EWC) to support continual learning across a 7-phase curriculum. Memory is augmented via a FAISS-backed distributed vector store with novelty gated admission. We report the results of a 512-dimensional proof of concept training run, demonstrating convergence of the Variance Invariance Covariance Regularization (VICReg) objective from 61.9 to 15.9. While the objective reduces, we observe a trivial solution collapse where the predictor functionally mimics the encoder (cosine similarity > 0.99), highlighting a critical capacity threshold requirement for ternary latent prediction. We hypothesize that scaling the architecture, combined with its lack of token level reconstruction, offers a structural approach to hallucination mitigation.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Sujith B (Sun,) studied this question.

synapsesocial.com/papers/6a153a2eb5d9c58d83e8cefd https://doi.org/https://doi.org/10.5281/zenodo.20369145

Bookmark

View Full Paper