Key points are not available for this paper at this time.
ABSTRACT In the PISA 2022 creative thinking test, students provide a response to a prompt, which is then coded by human raters as no credit, partial credit, or full credit. Like many large‐scale educational testing frameworks, PISA uses the generalized partial credit model (GPCM) as a response model for these ordinal ratings. In this paper, we show that the instructions given to the raters violate some assumptions of the GPCM as it is used: Raters are instructed to rate according to steps that involve multiple attributes (appropriateness and diversity/originality), with a different (set of) attribute(s) necessary to pass the different thresholds of the scoring scale. Instead of the GPCM, we propose multidimensional generalized item response tree models that allow us to account for the sequential nature of the ratings and to disentangle the attributes measured from the original scores. We discuss advantages, limitations, as well as recommendations for future research.
Myszkowski et al. (Sun,) studied this question.