In judged sports, such as rhythmic gymnastics, figure skating, and baton twirling, inter-judge variability in scoring—the degree to which judges differ in their scoring of the same athlete—is a common concern. This study quantitatively examined inter-judge variability using freestyle scores from the World Baton Twirling Championships held in 2018 and 2022. Data were collected from the preliminary rounds of senior and junior women's divisions, with scores assigned by seven judges. Welch's analyses of variance were performed to assess the effects of athlete ranking group (high, middle, low) on inter-judge variability across two scoring axes: technical merit (TM) and artistic expression (AE). These analyses were conducted separately for each competition year (2018 and 2022) and division (senior and junior). In the senior division in 2018, inter-judge variabilities for both TM and AE were significantly lower in the high-ranking group than in the other ranking groups. In the junior division in 2018, inter-judge variabilities for both TM and AE were significantly lower in the high-ranking group than in the low-ranking group. These findings are interpreted in terms of the interactions among the competitive structure of baton twirling and the cognitive processes involved in judging.
Natsuda et al. (Sun,) studied this question.