Key points are not available for this paper at this time.
As a promising solution for model compression, knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency. Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge (labels) to supervise the learning of a compact student model. However, we find such a standard distillation paradigm would incur serious bias issue --- popular items are more heavily recommended after the distillation. This effect prevents the student model from making accurate and fair recommendations, decreasing the effectiveness of RS.
Chen et al. (Wed,) studied this question.