Green learning (GL) promotes sustainability in deep learning by emphasizing energy-efficient solutions and lightweight models. Implicit neural representations (INRs) for videos provide a compact and efficient approach to video representation within this paradigm. This study introduces SNeRV+, a spatiotemporal, spectra-preserving neural representation for a video that employs neural tangent kernel (NTK) analysis to enhance learning. To mitigate spectral bias in both spatial and temporal domains, SNeRV+ employs a two-level processing approach, where separate encoder branches handle low-frequency (LF) and high-frequency (HF) components. A three-dimensional discrete wavelet transform decomposes each frame into its temporal variations, encoding LF and HF components into frame-wise embeddings. LF components, which capture static scenes and steady motion, are decoded with fixed parameters across frames, reducing temporal discrepancies and mitigating spectral bias. HF components, which encode time-varying details, are dynamically reconstructed using temporally adaptive weights that leverage LF-related parameters as prior information. This design enables a more efficient and compact representation of temporal variations. Experimental results demonstrate that SNeRV+ outperforms state-of-the-art INR-based methods in video regression, interpolation, extrapolation, and compression, achieving superior performance across both quantitative and qualitative evaluation metrics.
Kim et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: