Image classification in low-data regimes remains a challenging problem, particularly in stylized visual domains where intra-class similarity and inter-class feature overlap limit discriminative capacity. This study presents a systematic evaluation of regularization and transfer learning strategies for multi-class comic character recognition under constrained data conditions. Four convolutional architectures are compared: (i) a baseline CNN trained from scratch, (ii) a regularized CNN incorporating data augmentation, dropout, and early stopping, (iii) a pretrained ResNet-50 used as a fixed feature extractor, and (iv) a partially fine-tuned ResNet-50 with selective layer unfreezing. Experiments are conducted on a custom four-class dataset exhibiting moderate class imbalance, evaluated using both a fixed 70/20/10 split and 5-fold cross-validation to assess generalization stability. Results indicate that shallow CNN architectures suffer from substantial overfitting, even when regularization is applied, whereas transfer learning significantly improves macro-averaged F1-score and out-of-distribution detection performance. Cross-validated results, the primary basis for inference given the dataset scale, show that both ResNet-50 strategies achieve equivalent mean accuracy of 95.0% (SD: ±0.4% for feature extraction, ±0.8% for fine-tuning; paired t = 0.00, p = 1.000), while shallow CNN architectures reach only 81–87%. Under a single fixed 70/20/10 partition (n = 69 test samples, 95% CI: ±9–12%), fine-tuning nominally reaches 98.5%; crucially, cross-validation deflates this figure to parity with feature extraction, confirming it reflects favorable partitioning rather than genuine architectural superiority. The primary finding is therefore that frozen ResNet-50 feature extraction is the recommended strategy: it matches fine-tuning in cross-validated generalization while requiring 15× fewer trainable parameters and exhibiting lower fold-to-fold variance. The findings demonstrate that pretrained deep residual representations transfer effectively to stylized comic imagery and that evaluation protocol selection critically impacts perceived performance in small datasets. These results provide practical guidelines for robust model selection in domain-specific, limited-data image classification tasks.
Parrillo et al. (Thu,) studied this question.