January 1, 2021Open Access

What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think

Key points are not available for this paper at this time.

Previous work has shown that human evaluations in NLP are notoriously under-powered.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Howcroft et al. (Fri,) studied this question.

David M. Howcroft

Verena Rieser

Heriot-Watt University

Edinburgh Napier University

Building similarity graph...

Analyzing shared references across papers