March 3, 2026Open Access

Follow the Forest Trail: Distillation by Gradient Boosting Models to Enhance Symbolic Regression Performance

Key Points

The proposed method improves accuracy in symbolic regression, emphasizing the need for explainable machine learning models.
Findings indicate a consistent enhancement of 3.3-6.7% accuracy over standard symbolic regression models across diverse datasets.
The ensemble approach, powered by gradient boosting, effectively distills knowledge into interpretable model outputs.
A novel knowledge distillation framework may facilitate better understanding of complex machine learning models.

Abstract

In recent years, Machine Learning (ML) models have been introduced across diverse scientific fields, due to their strong predictive performance. However, in many applications the demand for interpretable models often surpasses the need for mere accuracy. Symbolic Regression (SR) offers a solution with its concise and explainable formulas, yet it typically falls short in accuracy when compared to other ML models such as Random Forests. This paper proposes a novel two-step ensemble approach, inspired by the student-teacher model in knowledge distillation. Initially, a Gradient Boosting Model (GBM) is trained to harness its high accuracy; this model acts as the ‘teacher.’ Subsequently, an SR ’student’ model is trained on the output of the gradient boosting model, effectively distilling the knowledge into a form that combines interpretability with enhanced performance. We perform an extensive evaluation of this method across three different SR models and 10 regression datasets. Our findings show that the proposed method exhibits a consistent improvement of 3.3-6.7% in accuracy compared to standard SR models. Furthermore, this process serves as an explainability framework that helps to uncover and interpret the decisions of complex ML models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Assaf Shmuel

Teddy Lazebnik

University of Haifa

Oren Glickman

Journals

SHILAP Revista de lepidopterología

IEEE Access

Actions

Institutions

Bar-Ilan University

University of Haifa

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Shmuel et al. (Thu,) studied this question.

synapsesocial.com/papers/69a75b2dc6e9836116a22039 — DOI: https://doi.org/10.1109/access.2026.3657793

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

XGBoost· 2016 · 47,967 citations
Event labeling combining ensemble detectors and background knowledge· 2013 · 416 citations
Genetic algorithms· 2012 · 321 citations
Interpretable Machine Learning in Healthcare· 2018 · 229 citations
Distilling the Knowledge in a Neural Network· 2015 · 13,958 citations

Follow the Forest Trail: Distillation by Gradient Boosting Models to Enhance Symbolic Regression Performance

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider