What question did this study set out to answer?

The aim is to create a faster and more memory-efficient framework for optimal sparse decision tree classifiers using Python.

March 5, 2026Open Access

Optimizing GOSDT-Guesses: A faster, memory-efficient python implementation with LightGBM-based threshold guessing

Key Points

The aim is to create a faster and more memory-efficient framework for optimal sparse decision tree classifiers using Python.
Reimplemented GOSDT-Guesses in Python to improve accessibility.
Leveraged LightGBM for effective threshold guessing.
Introduced adaptive threshold grouping to mitigate feature explosion.
Designed a depth guessing algorithm based on feature characteristics.
Implemented pruning techniques to accommodate various loss functions.
Achieved predictive performance comparable to or better than GOSDT-Guesses and black-box models.
Demonstrated significant improvements in training time and memory efficiency.

Abstract

Decision trees are among the most interpretable machine learning models, yet building globally optimal and sparse decision trees efficiently remains a significant challenge. In this thesis, we present a memory-efficient framework for constructing Optimal Sparse Decision Tree Classifiers, improving upon the existing GOSDT and GOSDT-Guesses methods. All our experiments are conducted on classification datasets, and our proposed enhancements aim to address limitations in scalability, efficiency, and flexibility by reimplementing the GOSDT-Guesses pipeline entirely in Python-removing the dependency on a C++ backend and enabling easier experimentation. Key contributions of our methodology include: (1) leveraging LightGBM for threshold guessing to generate effective candidate splits; (2) introducing adaptive threshold grouping to reduce feature explosion during binarization; (3) designing a data-sensitive depth guessing algorithm based on feature cardinality and class entropy; and (4) implementing lower-bound pruning techniques that support a variety of loss functions beyond entropy. Additionally, (5) we redesign the tree construction process using recursive search and memoization, with support for multiple loss functions and a mechanism for certifying optimality. Experimental results on several benchmark classification datasets show that our framework achieves predictive performance comparable to or better than GOSDT-Guesses and common black-box models, while offering significant improvements in training time and memory efficiency. This work presents a practical and interpretable solution for optimal decision tree classifier construction on complex, high-dimensional datasets.

Bookmark

View Full Paper

Bookmark

View Full Paper

Optimizing GOSDT-Guesses: A faster, memory-efficient python implementation with LightGBM-based threshold guessing

Key Points

Abstract

Cite This Study