The automated classification of Hebrew, a morphologically rich language (MRL), presents unique challenges, particularly when high-quality labeled data are scarce. This study investigates the interplay between handcrafted feature engineering and transformer-based representations in a low-resource news classification setting (n = 306). We evaluate a multi-task classification across four distinct dimensions: domain, sentiment, gender, and source. Our methodology employs an extensive feature space of 2149 stylistic and content-based attributes, optimized through a systematic Hill-Climbing selection process. We contrast five classical machine learning architectures with five BERT-based models, integrating five oversampling strategies to mitigate class imbalance. The results reveal that in scenarios of extreme data scarcity, the performance gap between deep learning and optimized classical ML becomes marginal, with stylistic features providing critical stability and interpretability. This study contributes a curated Hebrew news dataset and establishes a robust benchmark, demonstrating that linguistically aware feature engineering remains a vital component for MRL processing when large-scale fine-tuning is impractical.
HaCohen-Kerner et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: