What question did this study set out to answer?

This work aims to enhance object recognition in artwork through an efficient fine-tuning framework.

March 7, 2026Open Access

InfoMSD: an information-maximization self-distillation framework for parameter-efficient fine-tuning on artwork images

Key Points

This work aims to enhance object recognition in artwork through an efficient fine-tuning framework.
Proposed InfoMSD framework incorporating unsupervised self-distillation
Utilized a teacher-student model for generating pseudo-labels
Implemented cross-entropy learning from teacher to student
Applied entropy-based regularization to improve predictions
Updated only layer norm parameters and visual prompts for efficiency
Achieved accuracy improvements of +6.43% and +3.02% over CLIP zero-shot baselines
Less than 1% of model parameters updated during fine-tuning
Avg accuracy gains of 1.35% and 0.96% compared to existing methods
Demonstrated effective performance while reducing computational overhead

Abstract

In recent years, despite the remarkable performance of large-scale vision language models across various visual classification tasks, their substantial parameter counts and high fine-tuning costs have hindered deployment in resource-constrained cultural and artwork settings. This work specifically addresses the task of object recognition in artwork—that is, identifying semantic objects (e.g., animals, people, everyday items) depicted within paintings, sketches, and other artistic renditions, rather than classifying artistic styles or genres. To address this issue, we propose InfoMSD, an unsupervised, Information-Maximization Self-Distillation framework designed for parameter-efficient fine-tuning on unlabeled artwork imagery while preserving robust performance. Specifically, InfoMSD incorporates a teacher-student architecture in the self-distillation phase, where the teacher model generates pseudo-labels for artworks, and the student model learns from the teacher through cross-entropy. By aligning the student's predictions with the discriminative signals from the teacher's pseudo-labels and simultaneously applying entropy-based regularization to sharpen the probability distribution and balance class coverage, the framework improves both the quality of the pseudo-labels and the discriminative capacity of the model. To enable parameter-efficient fine-tuning, only the layer norm parameters and visual prompts in the student model are updated, while the remaining parameters are frozen, significantly reducing computational overhead. Extensive experimental results on artwork datasets show that InfoMSD achieves accuracy improvements of +6.43 and +3.02% over CLIP zero-shot baselines, while adjusting less than 1% of the model parameters. Compared to existing lightweight distillation methods, InfoMSD achieves average accuracy gains of 1.35 and 0.96%, respectively. Overall, InfoMSD offers a novel, information-theoretic paradigm for unsupervised and efficient fine-tuning in object recognition within artistic imagery, balancing performance and efficiency.

Bookmark

View Full Paper

Bookmark

View Full Paper

InfoMSD: an information-maximization self-distillation framework for parameter-efficient fine-tuning on artwork images

Key Points

Abstract

Cite This Study