Background: Plain language summaries (PLS) aim to make scientific research understandable to non-specialists, yet producing clear and accurate summaries remains challenging, especially for non-native English writers. With advances in large language models (LLMs), automated summarisation offers a potential solution. However, few studies have directly compared human-written PLS with outputs from multiple LLMs using the same dataset. Aims: To assess whether LLMs can support or augment human efforts to produce effective and accessible scientific communication. Materials and Methods: In this cross-sectional study, 30 human-written PLS were compared with 180 PLS generated by six LLMs. Readability was assessed using Flesch Reading Ease, Flesch–Kincaid Grade Level, sentence length and syllables per word. Three independent reviewers evaluated clarity, inclusiveness, interpretation and factual accuracy. Group differences were analysed using one-way ANOVA with post hoc Tukey testing. Results: LLM-generated summaries were significantly more readable than human-written summaries across all metrics ( P < 0.001). Human-authored PLS showed marginally higher factual accuracy, though overall reviewer-rated quality did not differ significantly. Among the LLMs, Gemini produced the simplest text, whereas Meta Artificial intelligence demonstrated the best balance of readability and quality. Conclusion: LLMs can generate PLS that are comparable in quality to human-written summaries while offering substantially improved readability. These tools may enhance accessibility for diverse audiences, though human oversight remains essential to ensure contextual accuracy and interpretive depth.
Jain et al. (Thu,) studied this question.