Abstract Seismic denoising is a critical preprocessing step in exploration geophysics, as random noise and acquisition artifacts frequently obscure subsurface reflectors and faults. Traditional denoising methods—such as median filtering, edge-preserving smoothing (EPS), and polynomial smoothing—can suppress noise but often at the expense of blurring geological features. This work introduces a foundation-model approach: we fine-tune a Vision Transformer masked autoencoder (ViT–MAE), pretrained on large seismic datasets (the Seismic Foundation Model, SFM), for three denoising objectives: (1) removal of 15% random noise ("ViT-Random"), (2) emulation of an edge-preserving smoothing filter ("ViT-EPS"), and (3) emulation of a polynomial smoothing filter ("ViT-Poly"). Training and validation are conducted on a real 3-D field volume of size 271 × 221 × 876 (inlines × crosslines × time samples), from which 1,084 nonoverlapping 224 × 224 patches are extracted. Data are split with 80% (≈867 patches) for training and 20% (≈217 patches) for validation. All inputs are z-score normalized prior to fine-tuning. We compare fine-tuned ViT models against classical filters using peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and mean squared error (MSE). On validation, ViT-Random achieves PSNR = 17.11 dB, SSIM = 0.5584, and MSE = 0.0201 versus median filtering at 13.63 dB / 0.2609 / 0.0444. ViT-EPS yields PSNR = 30.31 dB, SSIM = 0.9661, and MSE = 0.0010 compared to edge-preserving smoothing at 21.15 dB / 0.7856 / 0.0079. ViT-Poly obtains PSNR = 36.56 dB, SSIM = 0.9955, and MSE = 0.0003 versus polynomial smoothing at 22.41 dB / 0.8458 / 0.0065. Loss curves confirm stable convergence within 100 epochs. On an unseen Norway F3 cube (458 × 623 × 1001), ViT-Random achieves PSNR = 29.42 dB, SSIM = 0.8853, and MSE = 0.0011—despite no additional retraining—demonstrating strong generalization to real field data. Although classical filters yield higher quantitative scores, ViT-Random preserves more structural detail and textural continuity. Qualitative results show that ViT models suppress noise while preserving reflector continuity, thin-bed stratigraphy, and fault sharpness. These findings highlight the value of pretrained seismic models and demonstrate that fine-tuning on synthetic perturbations—with minimal labeled data—can yield structure-aware denoising models that generalize well to real-world seismic volumes.
Alharthi et al. (Tue,) studied this question.