What question did this study set out to answer?

To develop an attribute reduction method for neighborhood rough sets grounded in Information Flow theory.

March 15, 2026Open Access

IF-EMD-SPA: An Information Flow-Based Neighborhood Rough Set Approach for Attribute Reduction

Key Points

To develop an attribute reduction method for neighborhood rough sets grounded in Information Flow theory.
Proposed method IF-EMD-SPA for attribute reduction integrating Earth Mover’s Distance and Set Pair Analysis.
Established a unified representation framework for mixed attributes based on classifications.
Implemented a three-stage greedy reduction strategy including forward selection, structure completion, and redundancy removal.
Achieved average accuracies of 93.5% for k-Nearest Neighbors and 93.9% for Support Vector Machine.
SVM reached the best results across seven datasets.
CART achieved 100% accuracy on Wine and WPBC, improving performance by up to 37.5 percentage points.

Abstract

High-dimensional mixed data often lack a unified semantic representation for continuous and discrete attributes, which hinders mixed-attribute similarity modeling and can result in unstable reducts and overfitting in existing neighborhood rough set (NRS) methods. To address this issue, we propose IF-EMD-SPA, an attribute reduction method for NRS grounded in Information Flow theory. Unlike conventional NRS methods that rely on discretization or a single reduction criterion, IF-EMD-SPA first establishes a unified representation framework for heterogeneous attributes based on classifications and an Information Channel Core. It then integrates Earth Mover’s Distance (EMD) and Set Pair Analysis (SPA) to define a similarity metric for mixed attributes. In addition, a three-stage greedy reduction strategy is designed under the dual constraints of dependency preservation and structural error, consisting of dependency-driven forward selection, similarity-driven structure completion, and backward redundancy removal. Experiments on five UCI benchmark datasets and two high-dimensional gene expression datasets show that IF-EMD-SPA achieves average accuracies of 93.5% (k-Nearest Neighbors, KNN), 93.9% (Support Vector Machine, SVM), and 90.8% (Classification and Regression Trees, CART), with SVM achieving the best results on all seven datasets. Under CART, it reaches 100% accuracy on Wine and WPBC, improving performance by up to 37.5 percentage points over comparison methods.

IF-EMD-SPA: An Information Flow-Based Neighborhood Rough Set Approach for Attribute Reduction

Key Points

Abstract

Cite This Study