Key points are not available for this paper at this time.
Feature selection, an essential technique in data mining, is often confined to batch learning or online idealization of data scenarios despite its significance. Existing online feature selection methods have specific assumptions regarding the data stream, such as requiring a fixed feature space with an explicit pattern and complete labeling of samples. Unfortunately, data streams generated in many real scenarios commonly exhibit arbitrarily incomplete feature spaces and scarcity labels, making existing approaches unsuitable for real applications. To fill these gaps, this study proposes a new problem called Online Feature Selection with Varying Features Spaces (OFSVF). OFSVF has a three-fold main idea: 1) it leverages Gaussian Copula to model the incomplete feature correlation in a complete latent space, encoded by continuous variables, 2) it employs a novel tree-ensemble-based approach to select the most informative features on-the-fly, and 3) it develops the underlying geometric structure of instances to establish the relationship between unlabeled and labels. Experimental results are documented to demonstrate the feasibility and effectiveness of our proposed method.
Zhuo et al. (Mon,) studied this question.