What question did this study set out to answer?

This research aims to compare different statistical models for estimating overall and subscale scores in tests.

February 11, 2026Open Access

Comparison of Models for Simultaneous Estimation of Overall Score and Subscores: Estimation Accuracy, Reliability, and Classification Accuracy

Key Points

This research aims to compare different statistical models for estimating overall and subscale scores in tests.
Evaluated MIRT, HO-IRT, and Bifactor models using simulated and real data
Conducted simulation with 5,000 respondents and 120 items
Analyzed various characteristics including item format and test difficulty
MIRT outperformed the other models with lowest RMSE and highest reliability
HO-IRT showed strong performance, though less effective than MIRT
Bifactor model underperformed, especially in estimating subscores

Abstract

This study compares the performance of the Multidimensional Item Response Theory (MIRT), Higher-Order IRT (HO-IRT), and Bifactor models for the simultaneous estimation of total and subscale scores in multidimensional tests. Using both simulated data and real data from an English proficiency exam, model performance was evaluated in terms of accuracy (RMSE), reliability, and classification accuracy. The simulation included 5,000 respondents, 120 items, and a four-dimensional structure, manipulating item format, test difficulty, and inter-dimensional correlation. Results indicated that MIRT consistently outperformed the other models, yielding the lowest RMSE and highest reliability and classification accuracy across conditions. HO-IRT also showed strong performance, while the Bifactor model underperformed, particularly in subscore estimation. Model performance was sensitive to test characteristics and dimensional relationships. Findings from the real data analysis supported the simulation results, underscoring the value of multidimensional modeling for diagnostic feedback and informed decision-making.

Bookmark

View Full Paper

Bookmark

View Full Paper

Comparison of Models for Simultaneous Estimation of Overall Score and Subscores: Estimation Accuracy, Reliability, and Classification Accuracy

Key Points

Abstract

Cite This Study