Deep neural networks for image classification require protection against unauthorized use and redistribution. Existing watermarking methods suffer from a critical vulnerability: watermarks are always active and detectable, allowing adversaries to identify and remove them before deployment. We propose DormMark, a novel framework for image classification models that introduces delayed-activation watermarks which remain dormant and hidden under deployment-time black-box query auditing during initial deployment, but automatically activate upon fine-tuning. Our approach employs a three-stage training paradigm: (1) embedding watermarks using triggered samples, (2) masking to suppress watermark functionality while preserving its latent presence, and (3) activation through standard fine-tuning without owner intervention. This mechanism exploits neural networks’ forgetting-remembering behaviors during continued training, creating a fragile equilibrium that behaves similarly to a clean model under deployment-time black-box auditing but reliably manifests ownership indicators after modification. We consider a private-key black-box verification setting in which the owner keeps the concrete trigger instances secret. Experiments across multiple architectures (VGG19, ResNet-18/56, DenseNet-121, WideResNet-34) and datasets (CIFAR-10, CIFAR-100, GTSRB) demonstrate 100% watermark success rates, high imperceptibility (PSNR > 38 dB, SSIM = 0.99), negligible accuracy loss (< 0.04%), and robustness against 80% parameter pruning. DormMark represents a paradigm shift from static to conditionally-activated ownership verification, providing a more robust framework for intellectual property protection.
Nie et al. (Mon,) studied this question.