June 30, 2024Open Access

A Comparative Study of Quality Evaluation Methods for Text Summarization

Key Points

Key points are not available for this paper at this time.

Abstract

Evaluating text summarization has been a challenging task in natural language processing (NLP). Automatic metrics which heavily rely on reference summaries are not suitable in many situations, while human evaluation is time-consuming and labor-intensive. To bridge this gap, this paper proposes a novel method based on large language models (LLMs) for evaluating text summarization. We also conducts a comparative study on eight automatic metrics, human evaluation, and our proposed LLM-based method. Seven different types of state-of-the-art (SOTA) summarization models were evaluated. We perform extensive experiments and analysis on datasets with patent documents. Our results show that LLMs evaluation aligns closely with human evaluation, while widely-used automatic metrics such as ROUGE-2, BERTScore, and SummaC do not and also lack consistency. Based on the empirical comparison, we propose a LLM-powered framework for automatically evaluating and improving text summarization, which is beneficial and could attract wide attention among the community.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper

Cite This Study

Nguyen et al. (Sun,) studied this question.

synapsesocial.com/papers/68e625cfb6db6435875b8070 https://doi.org/https://doi.org/10.48550/arxiv.2407.00747

AI에게 질문

Bookmark

View Full Paper