Reliable environmental sensor data are fundamental for accurate urban climate modeling and evidence-based planning. Conventional physics-based quality control (QC) methods apply fixed thresholds to flag physically implausible values, but they often fail to detect subtle, context-dependent anomalies. This study introduces a hybrid QC framework that integrates conservative physical constraints with a probabilistic machine-learning approach based on Positive-Unlabeled XGBoost (PU-XGBoost). Using data from the CROCUS Urban Integrated Field Laboratory in Chicago, the framework generates anomaly likelihood probabilities rather than binary flags, allowing confidence-weighted data evaluation. The results demonstrate that the hybrid method effectively captures both gross and latent sensor errors overlooked by rule-based QC, while maintaining interpretability through physically informed features. Feature importance analysis highlights the dominant roles of temporal statistics, sensor type, and environmental context in anomaly detection. Overall, the proposed hybrid framework provides a scalable and interpretable foundation for self-adaptive quality assurance in next-generation urban environmental sensing networks.
Lee et al. (Thu,) studied this question.