Millions of people worldwide face barriers in accessing and understanding complex written information due to limited literacy. Automatic text simplification (ATS) addresses this challenge by transforming complex texts into simpler, more accessible versions. However, most existing ATS research focuses on English, leaving Spanish, a language spoken by over 500 million people, underrepresented. This paper fills this gap by introducing large-scale sentence-aligned simplification resources for Spanish, developed from the Newsela and ClearSim corpora. We propose detailed guidelines for manual alignment, evaluate a wide range of automatic sentence alignment algorithms, and present the first systematic exploration of LLM-based monolingual sentence alignment in Spanish. Our analysis incorporates comprehensive quantitative and qualitative evaluation, supported by statistical significance testing, and reveals clear differences in the structural simplification patterns across corpora. In addition, we train and release baseline ATS models using the new aligned datasets, demonstrating their practical utility for downstream simplification. All alignment code, trained models, and evaluation scripts will be publicly released to ensure transparency and reproducibility. Together, these contributions substantially advance the resources and methodology for Spanish-language ATS.
Niklaus et al. (Tue,) studied this question.