Key points are not available for this paper at this time.
In traditional classification problems, the reference needed for training a classifier is given and considered to be absolutely correct. However, this does not apply to all tasks. In emotion recognition in non-acted speech, for instance, one often does not know which emotion was really intended by the speaker. Hence, the data is annotated by a group of human labelers who do not agree on one common class in most cases. Often, similar classes are confused systematically. We propose a new entropy-based method to evaluate classification results taking into account these systematic confusions. We can show that a classifier which achieves a recognition rate of "only" about 60 % on a four-class-problem performs as well as our five human labelers on average.
Steidl et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: