The identification of artistic style in art is important in conservation of cultural heritage, automated analysis, and interpretation. In Chinese art, which is ultimately very diverse and philosophically influenced, styles blend into each other, with nuances in symbols and time, making the computational classification task difficult. The idea of using AI and, to be more precise, deep learning in decoding intricate visual patterns seems to be prospective as far as the fine art is concerned. Low-level features are mostly used to classify styles in traditional style classification methods, which lack any contextual or interpretive richness; frameworks of CNN-based or deep matrix models fall into this category. In the case of Chinese paintings, these techniques are not sufficient because they were not able to cope with the ambiguity in style, regionalism and the fine strokes of brush and composition styles. Moreover, the majority of these models have been trained on the datasets of Western art and, therefore, they do not even strive to customize the classification logic in a culture-specific manner. The proposed study assumes a new self-regulatory model, which is grounded in a Vision Transformer (ViT) that will augment the accuracy and interpretability of Chinese artistic style detection. This paper presents a selfregulative layer that changes in prediction of classification Vision Transformers (ViT), but unlike normal ViTs, it actively refines uncertain predictions using confidence, attention, and consistency measures. This adaptation mechanism enhances the quality of predictions and transparency and the results of simulating human-like decision-making via an iterative feedback mechanism and reweighting of attention maps, thus, strengthening the interpretability of the ViT architecture. The ViT extracts intricate spatial and semantic features from image patches, while the self-regulatory layer checks the confidence in prediction, attention focus, and their consistency. Outputs deemed to be uncertain are dynamically re-routed or reprocessed. The decision refinement capability simulates that of an expert. Using the framework on a Chinese curated dataset based on WikiArt, the system achieved an accuracy of 98.55, the precision of 98.50, the recall of 98.53, and the F1-score of 98.50. These findings suggest the capability of the proposed style of modeling in fine art classification that other models may show in terms of robustness, adaptability, and cultural sensitivity compared to the traditional CNN and convolutional transformer models.
Guo et al. (Fri,) studied this question.