What question did this study set out to answer?

This research aims to develop a novel defense mechanism against prompt injection attacks in large language models.

February 2, 2026Open Access

ÞÝÐING: Randomized Multi-Hop Translation as a Defense Against Prompt Injection Attacks

Puntos clave

This research aims to develop a novel defense mechanism against prompt injection attacks in large language models.
Introduced ÞÝÐING, a mechanism using randomized multi-hop translation across multiple languages.
Implemented 3-6 translations through randomly selected languages to disrupt syntactic attack patterns.
Conducted theoretical analysis and proposed experimental methodology for validation.
Demonstrated combinatorial unpredictability with over 32,760 possible 4-hop translation chains.
Preserved semantic content while effectively destroying syntactic vulnerabilities.
Proposed usage of defensive prompt augmentation to improve translation model security.

Resumen

Prompt injection attacks represent a critical vulnerability in Large Language Model (LLM) agent systems, enabling attackers to hijack agent behavior through malicious instructions embedded in untrusted content. Existing defenses—including paraphrasing, detection models, and instruction hierarchy—provide only partial protection and remain vulnerable to adaptive attacks. We propose ÞÝÐING (Icelandic: 'translation'), a novel defense mechanism that sanitizes untrusted input through randomized multi-hop translation across linguistically diverse language families. Unlike single-hop back-translation, ÞÝÐING chains 3-6 translations through randomly selected languages (e.g., English → Mongolian → Finnish → Arabic → English), destroying syntactic attack patterns while preserving semantic content. The randomized selection from a pool of 15+ languages creates combinatorial unpredictability (32,760+ possible 4-hop chains), making it computationally intractable for attackers to craft universal injections. We further introduce defensive prompt augmentation, instructing translation models to explain code rather than reproduce it, converting executable syntax into descriptive prose. We present theoretical foundations and propose experimental methodology for validation.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Helgason et al. (Sat,) studied this question.

synapsesocial.com/papers/6980fefbc1c9540dea8117ef https://doi.org/https://doi.org/10.5281/zenodo.18438221

Me gusta

Guardar

Ver artículo completo