What question did this study set out to answer?

To improve access to complex texts in Spanish by developing and evaluating sentence alignment methods for automatic text simplification.

March 5, 2026Open Access

A comparative study of sentence alignment methods for Spanish text simplification

Key Points

To improve access to complex texts in Spanish by developing and evaluating sentence alignment methods for automatic text simplification.
Developed large-scale sentence-aligned resources from Newsela and ClearSim corpora.
Proposed guidelines for manual sentence alignment.
Evaluated various automatic sentence alignment algorithms.
Conducted a systematic exploration of LLM-based monolingual sentence alignment in Spanish.
Employed comprehensive quantitative and qualitative analysis with statistical significance testing.
Identified clear differences in structural simplification patterns across Spanish corpora.
Trained and released baseline automatic text simplification models using the aligned datasets.
Demonstrated the practical utility of the new alignment methods for downstream text simplification.

Abstract

Millions of people worldwide face barriers in accessing and understanding complex written information due to limited literacy. Automatic text simplification (ATS) addresses this challenge by transforming complex texts into simpler, more accessible versions. However, most existing ATS research focuses on English, leaving Spanish, a language spoken by over 500 million people, underrepresented. This paper fills this gap by introducing large-scale sentence-aligned simplification resources for Spanish, developed from the Newsela and ClearSim corpora. We propose detailed guidelines for manual alignment, evaluate a wide range of automatic sentence alignment algorithms, and present the first systematic exploration of LLM-based monolingual sentence alignment in Spanish. Our analysis incorporates comprehensive quantitative and qualitative evaluation, supported by statistical significance testing, and reveals clear differences in the structural simplification patterns across corpora. In addition, we train and release baseline ATS models using the new aligned datasets, demonstrating their practical utility for downstream simplification. All alignment code, trained models, and evaluation scripts will be publicly released to ensure transparency and reproducibility. Together, these contributions substantially advance the resources and methodology for Spanish-language ATS.

Bookmark

View Full Paper

Cite This Study

Niklaus et al. (Tue,) studied this question.

synapsesocial.com/papers/69a91e2cd6127c7a504c1d48 https://doi.org/https://doi.org/10.1007/s10579-025-09879-4

Bookmark

View Full Paper