What question did this study set out to answer?

This research aims to address the limitations of current machine learning approaches in wildfire spread prediction by creating a standardized spatiotemporal dataset.

June 17, 2026Open Access

WildfireCube: A Dense Spatiotemporal Tensor to Support Multi-Regime Wildfire Spread Modeling at 30 m/3 h Resolution

Key Points

This research aims to address the limitations of current machine learning approaches in wildfire spread prediction by creating a standardized spatiotemporal dataset.
Developed WildfireCube, a dense fourth-order tensor constructed from multiple open data sources.
Utilized a physics-informed normalization framework for cross-event comparability in wildfire data.
Applied the method to 13 wildfire events across the U.S., Canada, and Greece from 2017 to 2023.
Created a catalog of processed wildfire data exceeding 300 GB, with a 2.58× data compression ratio.
Generated tensors representing varied burned areas and temperature ranges across diverse fire regimes.
This methodology lays groundwork for enhancing deep learning models in wildfire prediction.

Abstract

Machine learning approaches to wildfire spread prediction are constrained by the lack of standardized, multi-source, spatiotemporal datasets that fuse terrain, weather, and fire-state information into a single ML-ready format. We present WildfireCube, a reproducible event-centric pipeline and methodology for constructing dense fourth-order spatiotemporal tensors of shape (T, C, H, W) at 30 m spatial and 3 h temporal resolution. Following the analysis-ready data convention established in the Earth Observation community, the pipeline fuses four open data sources: the Copernicus GLO-30 Digital Elevation Model for static terrain derivatives, ERA5-Land reanalysis for hourly weather forcing, Sentinel-2 Level-2A imagery for spectral vegetation and burn-severity indices, and NASA FIRMS active-fire hotspot detections for fire-state reconstruction via ordinary kriging. The resulting 13-channel normalized tensor separates causal drivers into three physically motivated groups: static landscape controls (elevation, slope, aspect, fuel load), dynamic atmospheric forcings (wind components, temperature, precipitation), and evolving fire state (fire-front mask, burn severity, fractional burn, observation confidence). A physics-informed normalization framework maps all channels to bounded ranges using fixed physical constants rather than sample statistics, ensuring cross-event comparability and exact invertibility. We demonstrate the pipeline on 13 wildfire events across the United States, Canada, and Greece (2017–2023), producing a processed catalog exceeding 300 GB compressed and spanning a 14-fold range in burned area, a 27 °C range in mean temperature, and different fire regimes. Event tensors are stored in chunked Zarr archives with Zstandard compression, achieving a 2.58× compression ratio. As future work, the pipeline will be applied to a 40-event target catalog projected to exceed 2 TB of raw data, providing the multi-regime diversity and scale required for training robust deep learning models for spatiotemporal wildfire prediction.

Mark Helpful

Bookmark

Relay

View Full Paper