Prompt injection attacks represent a critical vulnerability in Large Language Model (LLM) agent systems, enabling attackers to hijack agent behavior through malicious instructions embedded in untrusted content. Existing defenses—including paraphrasing, detection models, and instruction hierarchy—provide only partial protection and remain vulnerable to adaptive attacks. We propose ÞÝÐING (Icelandic: 'translation'), a novel defense mechanism that sanitizes untrusted input through randomized multi-hop translation across linguistically diverse language families. Unlike single-hop back-translation, ÞÝÐING chains 3-6 translations through randomly selected languages (e.g., English → Mongolian → Finnish → Arabic → English), destroying syntactic attack patterns while preserving semantic content. The randomized selection from a pool of 15+ languages creates combinatorial unpredictability (32,760+ possible 4-hop chains), making it computationally intractable for attackers to craft universal injections. We further introduce defensive prompt augmentation, instructing translation models to explain code rather than reproduce it, converting executable syntax into descriptive prose. We present theoretical foundations and propose experimental methodology for validation.
Helgason et al. (Sat,) studied this question.