Model watermark is a technique to protect the deep learning models’ copyright. However, existing watermark methods are vulnerable to watermark attack. In ambiguity attack, attacker can reversely construct the input according to the preset output, and utilize this input-output pair as forged watermark. In fine-tuning attack, attacker can remove watermark by performing fine-tuning operations on model. To overcome these limitations, this paper proposes a black-box watermark method called WaViR (Watermark based on Vision Reasoning). WaViR consists of three modules. In watermark construction, the original image is transformed into hash image by cryptographic hash function. These original and hash image form into input-output pair for watermark trigger set. In watermark embedding, the trigger set is utilized to train the image generation model. Besides, simulated fine1tuning is introduced to improve the robustness of watermark. In watermark verification, vision reasoning is applied for ownership verification. For specific image within the trigger set, if the SSIM between the model’s output image and hash image exceeds the threshold, then verification is successful. Owing to the irreversibility of hash function, attacker cannot reversely construct the input that has hash relation with the preset output. Results show that WaViR can resist ambiguity attack and fine-tuning attack.
Liu et al. (Wed,) studied this question.