Nontargeted analysis using gas chromatography coupled with electron ionization high-resolution mass spectrometry (GC-EI-HRMS) is a vital tool for identifying a large quantity of compounds in complex environmental samples. Herein, we employed GC-EI-HRMS to profile chemicals in wastewaters from an iron and steel corporation. To address the challenge of quantifying compounds without authentic standards, we developed a stacked ensemble learning model using bootstrap aggregation to integrate the predictions from three distinct base learners. A total of 910 compounds were tentatively identified across all wastewater samples, which were primarily categorized into 19 subgroups. Based on these major categories, 278 reference standards were used to develop a stacked ensemble learner. This model outperformed semiquantitative methods based on surrogates, with 95% of quantification errors falling within a 3.79-fold range. Total quantified concentrations ranged from 8.86 × 105 (influent) to 7.01 × 103 μg/L (effluent) in coking wastewater and from 331 μg/L (influent) to 47.1 μg/L (effluent) in mixed wastewater. Notably, 97.1% of tentatively identified chemicals fell within the model's applicability domain. To facilitate the model application, a user-friendly predictor, EIQuan, was developed, providing an efficient tool for predicting response factors of GC-amenable environmental pollutants. This study establishes a robust framework for accurate semiquantification of unknown pollutants in complex industrial wastewater.
Liu et al. (Wed,) studied this question.