Abstract Unbiasedness for proficiency estimates is important for autoscoring engines since the outcome might be used for future learning or placement. Imbalanced training data may lead to certain biases and lower the prediction accuracy for classification algorithms. In this article, we investigated several data augmentation methods to lower the negative effect of imbalanced data in measurement settings. Four approaches were examined: (1) Resampling methods, either oversampling or undersampling; (2) Active resampling methods, where the resampling weight is based on representativeness in the training set; (3) Data expansion methods using synonym Replacement, slightly changing the meaning or semantics of the original answers; and (4) Content recreation method using Generative AI (e.g., ChatGPT) to create responses for less populated scores. We compared the performance (e.g., Accuracy, QWK, F 1) as well as the distance metric for different combinations of the methods. Two datasets with different imbalanced distributions were used. Results show that all four methods can help to mitigate the bias issue and the efficacy was influenced by the imbalance level, representativeness of the original data and the level of increment in the variety of the response (i.e., lexical diversity). In general, resampling and GenAI with active resampling showed the best overall performance.
Liu et al. (Wed,) studied this question.