High-dimensional streaming data implementations commonly utilize online streaming feature selection (OSFS) techniques. In practice, however, incomplete data due to equipment failures and technical constraints often poses a significant challenge. Online Sparse Streaming Feature Selection (OS 2 FS) tackles this issue by performing missing data imputation via latent factor analysis. Nevertheless, existing OS 2 FS approaches exhibit considerable limitations in feature evaluation, resulting in degraded performance. To address these shortcomings, this paper introduces a novel genetic algorithm-based online sparse streaming feature selection (GA-OS 2 FS) in data streams, which integrates two key innovations: (1) imputation of missing values using a latent factor analysis model, and (2) application of genetic algorithm to assess feature importance. Comprehensive experiments conducted on six real-world datasets show that GA-OS 2 FS surpasses state-of-the-art OSFS and OS 2 FS methods, consistently attaining higher accuracy through the selection of optimal feature subsets.
Liu et al. (Mon,) studied this question.