January 25, 2007Open Access

Bias in random forest variable importance measures: Illustrations, sources and a solution

CSCarolin StroblUniversity of Zurich ABAnne‐Laure BoulesteixZimmer Biomet (Netherlands)AZAchim ZeileisUniversität Innsbruck

Key Points

Key points are not available for this paper at this time.

Abstract

We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research.

KI fragen

Bookmark

View Full Paper