Key points are not available for this paper at this time.
Crowdsourcing is widely used to solicit judgement from people in diverse applications ranging from evaluating information quality to rating gig worker performance. To encourage the crowd to put in genuine effort in the judgement tasks, various ways to structure and organize these tasks have been explored, though the understandings of how these task design choices influence the crowd’s judgement are still largely lacking. In this paper, using recidivism risk evaluation as an example, we conduct a randomized experiment to examine the effects of two common designs of crowdsourcing judgement tasks—encouraging the crowd to deliberate and providing feedback to the crowd—on the quality, strictness, and fairness of the crowd’s recidivism risk judgements. Our results show that different designs of the judgement tasks significantly affect the strictness of the crowd’s judgements. Moreover, task designs also have the potential to significantly influence how fairly the crowd judges defendants from different racial groups, on those cases where the crowd exhibits substantial in-group bias. Finally, we find that the impacts of task designs on the judgement also vary with the crowd workers’ own characteristics, such as their cognitive reflection levels. Together, these results highlight the importance of obtaining a nuanced understanding on the relationship between task designs and properties of the crowdsourced judgements.
Duan et al. (Mon,) studied this question.