Abstract Accurate predictions of celestial pole offsets (CPO), one of the Earth Orientation Parameters (EOP), are essential for deep-space navigation, astronomy, and high-precision positioning. The evaluation of these forecasts, such as those collected during the Second Earth Orientation Parameters Prediction Comparison Campaign (2nd EOP PCC), typically relies on comparison against a single reference series. However, there is considerable variation among publicly available CPO solutions from different analysis centers, raising questions about the objectivity of assessments based on a single reference. This study investigates the impact of reference series selection on the estimated errors of CPO predictions. We re-evaluated CPO predictions from the 2nd EOP PCC against nine different reference solutions, including both combined and very long baseline interferometry (VLBI)-only series. Our analysis demonstrates that the choice of reference can fundamentally alter the assessment of CPO forecast accuracy. We found that the calculated mean absolute error (MAE) for predictions from the same method can vary significantly—by up to 0.076 mas for dX and 0.062 mas for dY—depending solely on the reference series used for validation. In many cases, the differences between MAE values calculated using different reference data exceeded the errors of the forecasts themselves, which could potentially affect the reliability of forecast assessments. This issue is further compounded by the fact that the prediction models themselves are often trained on different input data series than the single reference series used for the evaluation. When using the International Earth Rotation and Reference Systems Service (IERS) C04 14 and International VLBI Service for Geodesy and Astrometry (IVS) Rapid series as references, predictions based on different methods and input data exhibited smaller forecast errors than cases in which other observational data were used as the reference. Our study also highlights significant differences between the observational CPO series themselves, including variations in oscillation amplitudes at specific frequencies and the presence of outlying values. Our findings suggest that future forecast evaluations would benefit from a multi-reference framework to ensure a more robust and objective assessment. Graphical abstract
Partyka et al. (Sun,) studied this question.