What question did this study set out to answer?

The aim is to reduce linguistic complexity in complex texts to improve comprehension for general audiences using AI.

February 11, 2026Open Access

Resolving Information Asymmetry: A Framework for Reducing Linguistic Complexity Using Denoising Objectives

Key Points

The aim is to reduce linguistic complexity in complex texts to improve comprehension for general audiences using AI.
Developed a framework for controllable text simplification as a denoising task.
Utilized Asymmetry-Aware Masking to identify complex terms based on reconstruction difficulty.
Employed paraphrase context prompting to maintain meaning while simplifying.
Implemented an adaptive decoding strategy to minimize complexity dynamically.
Achieved SARI score of 42.90 and FKGL of 7.10 on ASSET dataset, indicating effective simplification.
Maintained high sentence similarity (0.948) while reducing linguistic complexity.
Performed consistently with a SARI score of 41.10 on TurkCorpus, showing robustness across datasets.

Abstract

Information asymmetry between complex source texts and general-audience comprehension remains a critical challenge in Artificial Intelligence. However, existing supervised simplification methods suffer from the scarcity of parallel training data, while standard text summarization methods often discard essential details to reduce length. Furthermore, zero-shot large language models frequently lack fine-grained controllability over linguistic complexity. To address these technical limitations, we present a framework to resolve information asymmetry by casting text simplification as a controllable denoising language modeling task. Unlike summarization, our approach preserves full semantic coverage while reducing difficulty. Our algorithm targets the problem of identifying and rewriting complex spans without labeled data through three mechanisms: (1) Asymmetry-Aware Masking, which uses model-based reconstruction difficulty (Negative Log-Likelihood) to isolate high-complexity terms; (2) paraphrase context prompting to enforce semantic invariance; and (3) an adaptive decoding strategy that dynamically penalizes complex tokens based on input difficulty. On ASSET (Abstractive Sentence Simplification Evaluation and Tuning dataset), our best setting reaches SARI (System output Against References and against the Input) 42.90 with FKGL (Flesch–Kincaid Grade Level) 7.10 (Sentence Similarity 0.948), and performs consistently on TurkCorpus (SARI 41.10), while requiring no parallel data or fine-tuning.

Mark Helpful

Bookmark

Relay

View Full Paper