Implicit Neural Representations (INRs) have emerged as a highly active and influential research direction for modeling signals such as images, audio, and 3D scenes. Their ability to represent coordinate-based functions using Multi-Layer Perceptrons (MLP) makes them a compelling framework for various reconstruction tasks. A natural question that arises is whether INRs can also be leveraged for self-supervised image denoising, where only a single noisy image is available and no paired clean-noisy data set exists. This setting is particularly relevant to many practical applications in biomedicine and astronomy, where training data is scarce, and clean samples are often infeasible to obtain. However, directly employing INRs for denoising is challenging and known to suffer from severe overfitting. Owing to their high capacity, INRs can easily fit both the underlying signal and the noise, undermining their denoising performance. In this thesis, we systematically analyze the structure and limitations of INRs for self-supervised (zero-shot) denoising and introduce several enhancements to improve their performance and stability. We integrate architectural modifications such as Sine activation layers and Fourier feature-based inputs to enhance their ability to capture high-frequency image structures. We also investigate network topology and show that deeper, narrower architectures outperform standard INR configurations. To mitigate overfitting during optimization, we introduce structured sparsity, which regularizes the model and stabilizes convergence. Finally, we propose a bagging-based ensemble of sparse INRs, where independently trained models are aggregated to reduce variance and improve reconstruction quality. Together, these contributions form a stable and effective INR-based framework for self-supervised image denoising, achieving higher PSNR compared to individual INR models.
Manasa Mangipudi (Thu,) studied this question.