Abstract Building fires are pervasive, high-consequence events, yet current inspection workflows remain inefficient. We propose a cognitively guided hybrid-optimization method that operationalizes modular prompt engineering for open-source visual–language models (VLMs) to automate building fire-hazard identification. Grounded in the ACT-R architecture, the approach decomposes professional reasoning into five optimizable modules and searches the discrete prompt space via a two-stage Bayesian–genetic procedure. Evaluated on 612 images spanning four hazard categories—structural damage, evacuation route, fire equipment missing, and debris accumulation—the system achieves 90.75% macro-F1 with 94.96% recall, outperforming LoRA fine-tuning (86.35% Macro-F1 with 100 training images) using zero training data, while matching proprietary models and retaining the flexibility of open-source VLMs. The results show that methodical prompt modularization and hybrid optimization can elicit professional-level performance in safety-critical tasks without model retraining, providing a scalable and practical computational pipeline for AI-assisted urban building safety supervision.
Zhang et al. (Tue,) studied this question.