Key points are not available for this paper at this time.
Recent research has shown that an adversary can use a surrogate model to steal the functionality of a target deep learning model even under the black-box condition and without data curation, while the existing defense mainly relies on API poisoning to disturb the surrogate training. Unfortunately, due to poisoning, the defense is achieved at the price of fidelity loss, sacrificing the interests of honest users. To solve this problem, we propose an Adversarial Fine-Tuning (AdvFT) framework, incorporating the generative adversarial network (GAN) structure that disturbs the feature representations of out-of-distribution (OOD) queries while preserving those of in-distribution (ID) ones, circumventing the need for OOD sample collection and API poisoning. Extensive experiments verify the effectiveness of the proposed framework. Code is available at github.com/Hatins/AdvFT.
Zhang et al. (Mon,) studied this question.