Context Fraud and corruption are among the main crimes affecting public institutions, with the healthcare sector being particularly vulnerable due to its structural complexity, the coexistence of public and private providers, the large number of actors involved, the globalized nature of supply chains, the high financial costs, and the information asymmetry among stakeholders. These factors weaken healthcare systems, resulting in resource waste, reduced resilience during medical emergencies, and limited access to essential services. Objective This study aims to evaluate automatic text summarization methods by comparing the quality of machine-generated summaries with those produced by humans, from the perspective of Data Scientists and SUS Auditors, within the context of audits carried out by the National Department of Unified Health System (Sistema Único de Saúde—SUS) Auditing (AudSUS). Method A controlled experiment was conducted to assess the performance of Small Language Models (SLMs) in summarization tasks, using the metrics ROUGE-N, ROUGE-L, BLEU, METEOR, and BERTScore. In addition, the consistency of results across 35 runs, their contribution to reducing information overload, and their pairwise performances were evaluated. Results The models NousResearch/Hermes-3-Llama-3.2-3B, Qwen/Qwen2.5-7B-Instruct, and meta-llama/Llama-3.2-3B-Instruct achieved the highest average performances across all metrics, standing out for their ability to preserve contextual meaning and synthesize essential information more effectively than human-generated summaries. Conclusion The findings highlight the potential of SLMs as tools to reduce information overload, thereby enhancing the effectiveness of the analytical phase of audits and enabling faster preparation of teams for the operational stage.
Guimarães et al. (Thu,) studied this question.