What question did this study set out to answer?

The aim is to assess how well small language models can summarize health-related news compared to human summaries.

February 8, 2026Open Access

Small language models applied in text summarization task of health-related news to improve public health audit: an experimental case study

Key Points

The aim is to assess how well small language models can summarize health-related news compared to human summaries.
Conducted a controlled experiment evaluating summarization methods
Compared machine-generated summaries with human-generated ones
Used metrics like ROUGE and BERTScore for performance assessment
Evaluated the consistency of results across multiple runs
Top models outperformed humans in producing accurate and contextual summaries
Models effectively reduced information overload during audits
Highest performance achieved by NousResearch/Hermes-3 and Qwen models

Abstract

Context Fraud and corruption are among the main crimes affecting public institutions, with the healthcare sector being particularly vulnerable due to its structural complexity, the coexistence of public and private providers, the large number of actors involved, the globalized nature of supply chains, the high financial costs, and the information asymmetry among stakeholders. These factors weaken healthcare systems, resulting in resource waste, reduced resilience during medical emergencies, and limited access to essential services. Objective This study aims to evaluate automatic text summarization methods by comparing the quality of machine-generated summaries with those produced by humans, from the perspective of Data Scientists and SUS Auditors, within the context of audits carried out by the National Department of Unified Health System (Sistema Único de Saúde—SUS) Auditing (AudSUS). Method A controlled experiment was conducted to assess the performance of Small Language Models (SLMs) in summarization tasks, using the metrics ROUGE-N, ROUGE-L, BLEU, METEOR, and BERTScore. In addition, the consistency of results across 35 runs, their contribution to reducing information overload, and their pairwise performances were evaluated. Results The models NousResearch/Hermes-3-Llama-3.2-3B, Qwen/Qwen2.5-7B-Instruct, and meta-llama/Llama-3.2-3B-Instruct achieved the highest average performances across all metrics, standing out for their ability to preserve contextual meaning and synthesize essential information more effectively than human-generated summaries. Conclusion The findings highlight the potential of SLMs as tools to reduce information overload, thereby enhancing the effectiveness of the analytical phase of audits and enabling faster preparation of teams for the operational stage.

Bookmark

View Full Paper

Bookmark

View Full Paper

Small language models applied in text summarization task of health-related news to improve public health audit: an experimental case study

Key Points

Abstract

Cite This Study