Random forest versus logistic regression: a large-scale benchmark experiment

Key Points

Key points are not available for this paper at this time.

Abstract

RF performed better than LR according to the considered accuracy measured in approximately 69% of the datasets. The mean difference between RF and LR was 0.029 (95%-CI =0.022,0.038) for the accuracy, 0.041 (95%-CI =0.031,0.053) for the Area Under the Curve, and - 0.027 (95%-CI =-0.034,-0.021) for the Brier score, all measures thus suggesting a significantly better performance of RF. As a side-result of our benchmarking experiment, we observed that the results were noticeably dependent on the inclusion criteria used to select the example datasets, thus emphasizing the importance of clear statements regarding this dataset selection process. We also stress that neutral studies similar to ours, based on a high number of datasets and carefully designed, will be necessary in the future to evaluate further variants, implementations or parameters of random forests which may yield improved accuracy compared to the original version with default values.

Bookmark

View Full Paper

Cite This Study

Couronné et al. (Tue,) studied this question.

synapsesocial.com/papers/69d56eb875589c71d767d56c https://doi.org/https://doi.org/10.1186/s12859-018-2264-5

Bookmark

View Full Paper