Key points are not available for this paper at this time.
Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated, with a particularly high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how (mostly intrinsic) human evaluation is currently conducted and presents a set of best practices, grounded in the literature. These best practices are also linked to the stages that researchers go through when conducting an evaluation research (planning stage; execution and release stage), and the specific steps in these stages. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chris van der Lee
Albert Gatt
Emiel van Miltenburg
Computer Speech & Language
Tilburg University
University of Malta
Building similarity graph...
Analyzing shared references across papers
Loading...
Lee et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69d6bceff174babf6cab3553 — DOI: https://doi.org/10.1016/j.csl.2020.101151