Key points are not available for this paper at this time.
The recent success of machine learning methods applied to time series from Intensive Care Units (ICU) exposes the lack of standardized learning benchmarks for developing and comparing such methods. While datasets, such as MIMIC-IV or eICU, can be freely accessed on Physionet, choice of tasks and pre-processing is often chosen ad-hoc for each, limiting comparability across publications. In this work, we aim improve this situation by providing a benchmark covering a large spectrum of-related tasks. Using the HiRID dataset, we define multiple clinically tasks in collaboration with clinicians. In addition, we provide a end-to-end pipeline to construct both data and labels. Finally, we an in-depth analysis of current state-of-the-art sequence modeling, highlighting some limitations of deep learning approaches for this of data. With this benchmark, we hope to give the research community the of a fair comparison of their work.
Yèche et al. (Tue,) studied this question.