November 22, 2020Open Access

Human evaluation of automatically generated text: Current trends and best practice guidelines

Key Points

Key points are not available for this paper at this time.

Abstract

Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated, with a particularly high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how (mostly intrinsic) human evaluation is currently conducted and presents a set of best practices, grounded in the literature. These best practices are also linked to the stages that researchers go through when conducting an evaluation research (planning stage; execution and release stage), and the specific steps in these stages. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Chris van der Lee

Albert Gatt

Emiel van Miltenburg

Journals

Computer Speech & Language

Actions

Institutions

Tilburg University

University of Malta

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Human evaluation of automatically generated text: Current trends and best practice guidelines

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study