Key points are not available for this paper at this time.
As major progress is made in open-ended text generation, measuring how close-generated text is to human language remains a critical open problem. We MAUVE, a comparison measure for open-ended text generation, which compares the learnt distribution from a text generation model to the of human-written text using divergence frontiers. MAUVE scales up modern text generation models by computing information divergences in a embedding space. Through an extensive empirical study on three-ended generation tasks, we find that MAUVE identifies known properties of text, scales naturally with model size, and correlates with human, with fewer restrictions than existing distributional evaluation.
Pillutla et al. (Tue,) studied this question.