What question did this study set out to answer?

The aim is to develop an efficient prototype selection strategy for K-NN classification that reduces memory footprint while maintaining accuracy.

March 5, 2026Open Access

Balanced Index-Encoding Genetic Algorithm for Extreme Prototype Reduction in k-Nearest Neighbor Classification

Key Points

The aim is to develop an efficient prototype selection strategy for K-NN classification that reduces memory footprint while maintaining accuracy.
Utilized a compact genetic algorithm to evolve prototype indices per class.
Evaluated performance across multiple datasets including synthetic shapes and real-world benchmarks.
Included comparisons against various K-NN configurations and ablations of genetic algorithm operators.
Conducted 30 independent runs for each scenario with comprehensive statistical analyses.
GA-selected prototypes were often significantly smaller than full design sets yet matched or improved accuracy.
Statistically supported wins were observed against established K-NN baselines.
No accuracy loss in more challenging scenarios while achieving substantial prototype compression.

Abstract

Nearest-neighbor classifiers are accurate and easy to deploy, but their memory footprint and inference time grow with the size of the reference set. This paper studies an evolutionary prototype selection strategy for k-nearest neighbor (K-NN) classification aimed at extreme, class-balancedreduction. A compact genetic algorithm (GA) evolves a fixed number of prototype indices per class drawn from a disjoint design partition; the selected prototypes are then used by a 1-NN classifier, with fitness defined as the number of correctly classified test instances. To address concerns about generality and baseline strength, we evaluate an experimental suite including synthetic 2D Gaussians (σ=0. 5 and σ=1. 0) and a 3D three-moons geometry, as well as public benchmarks spanning binary and multi-class settings and higher-dimensional data (Breast Cancer Wisconsin, Wine, Reduced MNIST/Digits 8 × 8, Forest CoverType with seven classes, and a 10D five-class spiral benchmark). We compare against K-NN baselines with k∈1, 3, 5, 7 using all design samples, and include GA operator ablations (GA1/GA2/GA3). Each scenario is repeated over 30 independent runs, reporting mean ± std, min/max, per-run distributions, win/tie/loss counts, and non-parametric significance tests (paired Wilcoxon with Holm correction; Friedman where applicable). Across datasets, the GA-selected prototype banks—often orders of magnitude smaller than the full design set—match or improve accuracy, with frequent statistically supported wins against strong K-NN baselines, and in the hardest cases provide substantial compression with no loss relative to the best baseline. These results establish a reproducible baseline for extreme, class-balanced prototype reduction suitable for memory- and latency-constrained deployments and for fair comparison against more elaborate prototype selection methods.

Balanced Index-Encoding Genetic Algorithm for Extreme Prototype Reduction in k-Nearest Neighbor Classification

Key Points

Abstract

Cite This Study

Also Consider

Also Consider