Feature selection is a critical stage in machine learning pipelines, yet the process is often complex and loosely structured. It typically involves multiple iterative steps, such as data pre-processing, relevance determination, and model-based evaluation, that are rarely captured by consistent metadata or transparent decision criteria. As a result, effective feature selection frequently depends on substantial domain expertise and is commonly performed and validated through manual, ad hoc procedures. To address these challenges, we present a metadata-driven agentic system for explainable and reproducible feature selection. The system integrates structured metadata, transparent audit trails, and a generative AI agent for analysis, reporting, and knowledge extraction.
Bikaki et al. (Fri,) studied this question.