Missing values may arise in climate data collection due to sensor malfunction, transmission errors, device calibration and operational issues. This problem can be more catastrophic in the case of multi-dimensional and high-frequency climate data sets, where some or all climate readings could be missing at multiple timestamps. These missing data in high-frequency climate modeling could lead to inaccurate prediction models, which in turn affect overall assessments, planning, and climate-related measures and policy. In this paper, we evaluate the performance of three imputation techniques based on the mean, k-nearest neighbor, time-based interpolation and a new temporal cross-year climate imputation approach using a random forest, long short-term memory (LSTM) model and contextual embedding-based Transformer regression methods. We discussed our findings on four years of multi-output, high-frequency and multi-dimensional climate data collected in Kuwait. Using a leave-one-year-out cross-validation approach, our results show that all imputation methods perform better than no imputation, with LSTM and time-based interpolation emerging as the best combination. Imputing climate data based on previous years’ timestamps did not yield good results, highlighting the variability of climate data across years.
Khan et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: