Early detection is a key tool for mitigating the devastating effects of wildfires. Single-frame detection methods that do not consider inter-frame dependencies often fail to detect smoke plumes at the earliest stage and at greater distances, or produce excessive false alarms. Biological vision is particularly sensitive to motion cues, and this translates well to automated systems. Recent temporal-memory approaches have demonstrated improved performance over purely spatial methods, but typically rely on complex, computationally heavy multi-stage architectures. This study investigates the possibility of encoding temporal and contextual information into additional image channels as a basis for compiling data models with increased information content. Seven distinct data models were proposed, and corresponding datasets were generated to train standard YOLO architectures without modifications to the network structure. The datasets were compiled from real wildfire footage collected from an operational wildfire surveillance system in Croatia, comprising 333 annotated sequences of real fires recorded between 2018 and 2024. Experimental evaluation compared the performance of YOLO models trained on the information-enriched datasets with those trained on standard RGB images. Based on the results, the best data model for early wildfire smoke detection, combining original RGB channels with short-term and long-term temporal memory, was selected. Comparative evaluation demonstrated improved detection accuracy, achieving up to 5 percent higher true-positive detection rate for models trained on spatio-temporal data compared to standard RGB images, while maintaining low inference latency. The proposed approach shifts the focus to the structure and information content of the data while preserving the efficiency of standard convolutional neural network architectures. This approach could be applied to other problems requiring high efficiency and real-time operation, where temporal and contextual information can improve detection performance.
Krstinić et al. (Tue,) studied this question.