What does this research mean for the field?

The TPACO method outperforms seven state-of-the-art feature selection algorithms in terms of accuracy and stability across 21 benchmark datasets. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.SUPPORTS_CONSENSUS.

What question did this study set out to answer?

The aim is to improve feature selection by integrating temporal difference reinforcement learning with ant colony optimization.

February 28, 2026Open Access

Temporal difference reinforcement learning-based ant colony optimization with extremized probability construction for feature selection

Key Points

The aim is to improve feature selection by integrating temporal difference reinforcement learning with ant colony optimization.
Developed a novel feature selection wrapper method named TPACO.
Implemented an extremized probabilistic path strategy for feature guidance.
Validated TPACO on 21 datasets using 10-fold cross-validation.
Compared performance against seven state-of-the-art feature selection algorithms.
TPACO outperformed seven existing feature selection algorithms.
Achieved high scores for precision, recall, and F1-measure with fewer than 10 features.
Demonstrated superior accuracy and stability across 21 benchmark datasets.

Abstract

• Integrates TD(0) RL into BACO for adaptive heuristic learning. • Extremized probabilistic path enhances exploration-exploitation. • Validated on 21 datasets (14–2000 features), 30 runs, 10-fold cross-validation. • Outperformed 7 state-of-the-art FS algorithms. • Maintains high precision, recall and F1 with <10 As an essential preprocessing technique in machine learning and pattern recognition, feature selection has become a focus of attention. Its primary goal is to select a subset of features that contain rich information in order to reduce dimensionality and improve accuracy. Intelligent algorithms have been successfully applied to feature selection, but their key parameters cannot be efficiently and dynamically adjusted during the computation process, resulting in the iteration stagnating in a local optimum or failing to converge. In this study, a novel feature selection wrapper method is proposed based on the ant colony optimization algorithm called TPACO. Unlike traditional methods, TPACO dynamically learns and updates heuristic information from experience during the search process, enabling more effective exploration of feature combinations. In addition, an extremized probability-based ant path strategy is introduced to guide feature selection toward higher-quality subsets. Experimental results on 21 benchmark classification datasets demonstrate that TPACO consistently outperforms seven state-of-the-art or classical FS methods in terms of accuracy and stability. These findings highlight the effectiveness of combining reinforcement learning with swarm intelligence for adaptive feature selection.

Temporal difference reinforcement learning-based ant colony optimization with extremized probability construction for feature selection

Key Points

Abstract

Cite This Study